Where to add a new format

In the workflows/ directory, jflow provides a formats.py file where new formats can be added.

jflow/
├── bin/
├── docs/
├── src/
├── workflows/
│   ├── components/
│   ├── extparsers/
│   ├── __init__.py
│   ├── formats.py   [ file where to add new jflow formats ]
│   ├── rules.py
│   └── types.py
├── applications.properties
└── README

How to add a new format

In jflow a format is represented by a function named by the desired format name. The function should take only one argument, whose value is the file path given by the user. The function is in charge to open and check the content of the file. If an error occurred or if the value does not meet the expected criteria, a jflow.InvalidFormatError should be raised with the suitable error message. This message will be used by jflow to inform the final user of the error.

In the following example, the fasta function checks if the 10 first lines of the input file are in a fasta format:

def fasta(ifile):
	try:
	    reader = seqio.FastaReader(ifile, wholefile=True)
	    nb_seq = 0
	    for id, desc, seq, qualities in reader:
	        nb_seq += 1
	        # only check the first 10 sequences
	        if nb_seq == 10: break
	except:
	    raise jflow.InvalidFormatError("The provided file '" + ifile + "' is not a fasta file!")

How to use a new format

The new created format can then be used in all add_input_* functions of the class jflow.workflow.Workflow and jflow.component.Component as following:

[...]
def define_parameters(self, function="process"):
    self.add_input_file("reference_genome", "Which genome should the read be align on", file_format="fasta", required=True)
[...]