Components

This part explains how to make new components. You need to know the following definitions first:

seed: a seed is a computer application or program that performs a specific task. This can be a simple bash command or a software suite like mutationSeq or Titan.

component: a component is a building block or an individual part of a pipeline generated by kronos. In order for a seed to be compatible with kronos, i.e. can be used as a part of a pipeline, it should be wrapped according to some specifications. Once the seed is wrapped properly, it is called a component and it is ready to use with kronos.

Develop a component

A component is basically a wrapper around a seed. To develop a component, you need to create a particular directory structure in which there are specific files required to wrap the seed.

In brief, to develop a component you need to:

  1. create a directory named the same as the component name, e.g. my_comp

  2. create a directory named component_seed inside the my_comp directory and copy the seed source code into it

  3. create the following files inside the my_comp directory:

Tip

All the above files and directories are generated by make_component command. The user only need to customize them.

Therefore, the component directory tree looks like the following:

|-- <component_name>
|   |-- __init__.py
|   |-- component_main.py
|   |-- component_params.py
|   |-- component_reqs.py
|   |-- component_ui.py
|   |-- component_seed

Note

<component_name> should be replaced with the actual name of the component, e.g. my_comp. The rest of the file and directory names should be exactly as shown above.

Tip

It is recommended to add the following files and directories (not generated automatically) as well:

  • component_test: a directory where all the test files exist
  • README: a readme file to provide more information about the component

Component_main

The core of a component wrapper is the component_main.py python script. This module defines Component class which extends the ComponentAbstract class.

Using the make_component command, the following component_main.py file is generated:

"""
component_main.py
This module contains Component class which extends
the ComponentAbstract class. It is the core of a component.

Note the places you need to change to make it work for you.
They are marked with keyword 'TODO'.
"""

from kronos.utils import ComponentAbstract
import os


class Component(ComponentAbstract):

    """
    TODO: add component doc here.
    """

    def __init__(self, component_name="my_comp",
                 component_parent_dir=None, seed_dir=None):

        ## TODO: pass the version of the component here.
        self.version = "v0.99.0"

        ## initialize ComponentAbstract
        super(Component, self).__init__(component_name,
                                        component_parent_dir, seed_dir)

    ## TODO: write the focus method if the component is parallelizable.
    ## Note that it should return cmd, cmd_args.
    def focus(self, cmd, cmd_args, chunk):
        pass
    #    return cmd, cmd_args

    ## TODO: this method should make the command and command arguments
    ## used to run the component_seed via the command line. Note that
    ## it should return cmd, cmd_args.
    def make_cmd(self, chunk=None):
        ## TODO: replace 'comp_req' with the actual component
        ## requirement, e.g. 'python', 'java', etc.
        cmd = self.requirements['comp_req']

        cmd_args = []

        args = vars(self.args)

        ## TODO: fill the following component params to seed params dictionary
        ## if the name of parameters of the seed are different than
        ## component parameter names.
        comp_seed_map = {
                         #e.g. 'component_param1': 'seedParam1',
                         #e.g. 'component_param2': 'seedParam2',
                        }

        for k, v in args.items():
            if v is None or v is False:
                continue

            ## TODO: uncomment the next line if you are using
            ## comp_seed_map dictionary.
            # k = comp_seed_map[k]

            cmd_args.append('--' + k)

            if isinstance(v, bool):
                continue
            if isinstance(v, str):
                v = repr(v)
            if isinstance(v, (list, tuple)):
                cmd_args.extend(v)
            else:
                cmd_args.extend([v])

        if chunk is not None:
            cmd, cmd_args = self.focus(cmd, cmd_args, chunk)

        return cmd, cmd_args

## To run as stand alone
def _main():
    c = Component()
    c.args = component_ui.args
    c.run()

if __name__ == '__main__':
    import component_ui
    _main()

Note

Note the places you need to change the generated file to make it work for you are marked with keyword ‘TODO’.

There are two methods in this file that you need to customize:

focus method

Each parallelizable component will require a focus method. The purpose of this method is to tell the component to process only one chunk of the input data rather than the entire file. How this is done will vary depending on the component, but basically will add to, or alter the component command to this end. For example, in the following implementation, focus method simply passes the chunk to the --interval option in the command arguments cmd_arg (most of the time, this implementation does the job):

focus(cmd, cmd_args, chunk):
    cmd_args.append('--interval ' + chunk)
    return cmd, cmd_args

Note

You need to implement focus method only if the component is parallelizable.

make_cmd method

All the components should implement this method in their component_main.py. This method essentially returns the command string that one can use to run the seed on a command line. For example, if the seed can be run using the following command:

python my_seed_command.py --foo data1 --bar data2

then make_cmd method would look like this (note that we only need to change the first two lines of the default file made by kronos):

def make_cmd(self, chunk):
    path = os.path.join(self.seed_dir, 'my_seed_command.py')
    cmd = self.requirements['python'] + ' ' + path

    cmd_args = []

    args = vars(self.args)

    ## TODO: fill the following component params to seed params dictionary
    ## if the name of parameters of the seed are different than
    ## component parameter names.
    comp_seed_map = {
                     #e.g. 'component_param1': 'seedParam1',
                     #e.g. 'component_param2': 'seedParam2',
                    }

    for k, v in args.items():
        if v is None or v is False:
            continue

        ## TODO: uncomment the next line if you are using
        ## comp_seed_map dictionary.
        # k = comp_seed_map[k]

        cmd_args.append('--' + k)

        if isinstance(v, bool):
            continue
        if isinstance(v, str):
            v = repr(v)
        if isinstance(v, (list, tuple)):
            cmd_args.extend(v)
        else:
            cmd_args.extend([v])

    if chunk is not None:
        cmd, cmd_args = self.focus(cmd, cmd_args, chunk)

    return cmd, cmd_args

Tip

In the above example, python is a requirement for the component and should be added to the component_reqs.py of the component. Also, parameters foo and bar should be added to the component_params.py.

ComponentAbstract class

This class comprises of the following attributes and methods:

Attributes:

Attribute Description
args the argparse namespace containing all the input arguments from the Component_ui module
components_dir path to the directory where the component exists
component_name name of the component - specific
component_params Component_params module of the component
component_reqs Component_reqs module of the component
env_vars see Component_reqs
memory see Component_reqs
parallel see Component_reqs
requirements see Component_reqs
seed_dir path to the directory where the seed exists. Most of the time it is <component_name>/<component_seed>
seed_version version of the seed
version version of the component - specific

Tip

specific means it should be assigned value when implementing a component.

Methods:

Method Description
__init__ initialize general attributes that each component must have
run run the component command locally
focus update the command and command arguments for each chunk - virtual
make_cmd make the command used to run the seed of the component. This returns the same command that one would use to run the component as a stand alone program via command line - virtual
test run unittest of the component - virtual

Tip

The class can be imported from utils module from the kronos package:

from kronos.utils import ComponentAbstract

Component_params

This is a python module and contains the following information:

  • input_files: a dictionary with keys being the input file parameters and the values being the default values or a proper flags based on the component UI. For example:
input_files={'samples':['tumour:__REQUIRED__',
                        'normal:__REQUIRED__',
                        'reference:__REQUIRED__',
                        'model:__REQUIRED__'
                        ],
             'config':'some_default.cfg',
             'positions_file':None
             }

Note

This dictionary includes only parameters that expect input files or directories.

  • output_files: a dictionary with keys being the output file parameters and the values being the default values or a proper flags based on the component ui. For example:
output_files = {'export_features':None,
                'log_file':'mutationSeq_run.log',
                'out':None
                }

Note

This dictionary includes only parameters that expect output files or directories.

  • input_params: a dictionary with keys being the input non_file parameters and the values being the default values or a proper flags based on the component ui. For example:
input_params = {'all':'__FLAG__',
                'buffer_size':'2G',
                'coverage':4,
                'deep':'__FLAG__',
                'interval':None,
                'no_filter':'__FLAG__'
                }

Note

All other parameters that are not included in input_files and output_files should be listed in input_params.

Using the make_component command, the following component_params.py file is generated:

"""
component_params.py

Note the places you need to change to make it work for you.
They are marked with keyword 'TODO'.
"""

## TODO: here goes the list of the input files. Use flags:
## '__REQUIRED__' to make it required
## '__FLAG__' to make it a flag or switch.
input_files  = {
#                 'input_file1' : '__REQUIRED__',
#                 'input_file2' : None
                }

## TODO: here goes the list of the output files.
output_files = {
#                 'output_file1' : '__REQUIRED__',
#                 'output_file1' : None
                }

## TODO: here goes the list of the input parameters excluding input/output files.
input_params = {
#                 'input_param1' : '__REQUIRED__',
#                 'input_param2' : '__FLAG__',
#                 'input_param3' : None
                }

## TODO: here goes the return value of the component_seed.
## DO NOT USE, Not implemented yet!
return_value = []

Example: This is an example showing the content of a component_params.py file:

input_files  = {'tumour':'__REQUIRED__',
                'normal':'__REQUIRED__',
                'reference':'__REQUIRED__',
                'model':'__REQUIRED__',
                'config':'metadata.config',
                'positions_file':None
                }

output_files = {'export_features':None,
                'log_file':'mutationSeq_run.log',
                'out':'__REQUIRED__'
                }

input_params = {'all':'__FLAG__',
                'buffer_size':'2G',
                'coverage':4,
                'deep':'__FLAG__',
                'interval':None,
                'no_filter':'__FLAG__',
                'normalized':'__FLAG__',
                'normal_variant':25,
                'purity':70,
                'mapq_threshold':20,
                'baseq_threshold':10,
                'indl_threshold':0.05,
                'manifest':'__OPTIONAL__',
                'single':'__FLAG__',
                'threshold':0.5,
                'tumour_variant':2,
                'features_only':'__FLAG__',
                'verbose':'__FLAG__',
                'titan_mode':'__FLAG__'
                }

Component_reqs

This is a python module and contains the following information:

  • env_vars: a dictionary with keys being the name of environment variables and values being the path/content to export. The values can be updated in the configuration file using env_var in the run subsection. Therefore, it is recommended not include the paths as values in this file and instead use an empty list, [], or None as a value.
  • memory: specifies the minimum memory required by the component to properly run on a cluster. The format is nG, e.g. 30G.
  • parallel: a boolean flag that specifies whether or not a component can run in parallel mode.
  • requirements: a dictionary with keys usually being the name of a program/software and values being None or the flag __REQUIRED__ . The values will be later updated by kronos using the content of the __GENERAL__ section.
  • seed_version: the version of the seed.
  • version: the version of the component.

Using the make_component command, the following component_reqs.py file is generated:

"""
component_reqs.py

Note the places you need to change to make it work for you.
They are marked with keyword 'TODO'.
"""

## TODO: here goes the list of the environment variables, if any,
## required to export for the component to function properly.
env_vars = {
#            'env_var1' : ['value1', 'value2'],
#            'env_var2' : 'value3'
            }

## TODO: here goes the max amount of the memory required.
memory = '5G'

## TODO: set this to True if the component is parallelizable.
parallel = False

## TODO: here goes the list of the required software/apps
## called by the component.
requirements = {
#                'python': '__REQUIRED__',
                }

## TODO: here goes the version of the component seed.
seed_version = '0.99.0'

## TODO: here goes the version of the component itself.
version = '0.99.0'

Example: This is an example showing the content of a component_reqs.py:

env_vars = {'LD_LIBRARY_PATH': []}

memory = '4G'

parallel = True

requirements = {'java': '__REQUIRED__'}

seed_version = 'version 3.2'

version = 'v1.0.1'

Component_ui

It is a python module that contains an argparse UI for the component. Using the make_component command, the following component_ui.py file is generated:

"""
component_ui.py

Note the places you need to change to make it work for you.
They are marked with keyword 'TODO'.
"""

import argparse

#==============================================================================
# make a UI
#==============================================================================
## TODO: pass the name of the component to the 'prog' parameter and a
## brief description of your component to the 'description' parameter.
parser = argparse.ArgumentParser(prog='my_comp',
                                 description = """
                                 brief description of your component goes here.""")

## TODO: create the list of input options here. Add as many as desired.
parser.add_argument(
                    "-x", "--xparam",
                    default = None,
                    help= """
                    help message goes here.
                    """)

## parse the argument parser.
args, unknown = parser.parse_known_args()

Example: This is an example showing the content a component_ui.py:

import sys
import argparse

#==============================================================================
# make a UI
#==============================================================================
parser = argparse.ArgumentParser(prog='snpeff',
                                 description='''Genetic variant annotation and effect
                                 prediction toolbox. It annotates and predicts the
                                 effects of variants on genes (such as amino acid changes)''',
                                 epilog='''Input file: Default is STDIN''')

# required arguments
required_arguments = parser.add_argument_group("Required arguments")

required_arguments.add_argument("--out",
                               default=None,
                               required=True,
                               help='''specify the path/to/out.vcf to save output to a file''')

# mandatory / positional arguments
required_arguments.add_argument("genome_version",
                               choices=['GRCh37.66'],
                               help='''genomic build version''')

required_arguments.add_argument("variants_file",
                               help='''file containing variants''')

# optional options
optional_options = parser.add_argument_group("Options")

optional_options.add_argument("-a", "--around",
                              default=False, action="store_true",
                              help='''Show N codons and amino acids around change
                              (only in coding regions). Default is 0 codons.''')

args, unknown = parser.parse_known_args()

Warning

It is required to use parse_known_args instead of parse_args.

component_seed

This is a directory within the component directory where all the source codes of the actual program reside.

Examples

Please refer to the components repository for more examples.

Components repository

All the production components can be cloned from here.

Tip

You need to export the path where you have cloned the components to the PYTHONPATH environment variable:

export PYTHONPATH=</path/to/components_dir>:$PYTHONPATH