Application Package¶
The Application Package defines the internal script definition and configuration that will be executed by a process. This package is based on Common Workflow Language (CWL). Using the extensive CWL Specification as backbone for internal execution of the process allows it to run multiple type of applications, whether they are referenced to by docker image, bash script or more.
Note
The large community and use cases covered by CWL makes it extremely versatile. If you encounter any issue running your Application Package in Weaver (such as file permissions for example), chances are that there exists a workaround somewhere in the CWL Specification. Most typical problems are usually handled by some flag or argument in the CWL definition, so this reference should be explored first. Please also refer to FAQ section as well as existing Weaver issue. Ultimately if no solution can be found, open an new issue about your specific problem.
All processes deployed locally into Weaver using a CWL package definition will have their full package definition
available with GET {WEAVER_URL}/processes/{id}/package
(Package) request.
Note
GET {WEAVER_URL}/processes/{id}/package
(Package) is Weaver-specific implementation, and therefore, is not necessarily available on other ADES/EMS
implementation as this feature is not part of OGC API - Processes specification.
Typical CWL Package Definition¶
CWL CommandLineTool¶
Following CWL package definition represents the weaver.processes.builtin.jsonarray2netcdf
process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
# target the installed python pointing to weaver conda env to allow imports
baseCommand: python
arguments:
- "${WEAVER_ROOT_DIR}/weaver/processes/builtin/jsonarray2netcdf.py"
- "-o"
- "$(runtime.outdir)"
inputs:
input:
type: File
format: iana:application/json
inputBinding:
position: 1
prefix: "-i"
outputs:
output:
format: edam:format_3650
type:
type: array
items: File
outputBinding:
glob: "*.nc"
$namespaces:
iana: "https://www.iana.org/assignments/media-types/"
edam: "http://edamontology.org/"
|
The first main components is the class: CommandLineTool
that tells Weaver it will be a base process
(contrarily to CWL Workflow presented later).
The other important sections are inputs
and outputs
. These define which parameters will be expected and
produced by the described application. Weaver supports most formats and types as specified by CWL Specification.
See Inputs/Outputs Type for more details.
CWL Workflow¶
Weaver also supports CWL class: Workflow
. When an Application Package is defined this way, the process
deployment operation will attempt to resolve each step
as another process. The reference to the CWL definition
can be placed in any location supported as for the case of atomic processes
(see details about supported package locations).
The following CWL definition demonstrates an example Workflow
process that would resolve each step
with
local processes of match IDs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | {
"cwlVersion": "v1.0",
"class": "Workflow",
"requirements": [
{
"class": "StepInputExpressionRequirement"
}
],
"inputs": {
"tasmax": {
"type": {
"type": "array",
"items": "File"
}
},
"lat0": "float",
"lat1": "float",
"lon0": "float",
"lon1": "float",
"freq": {
"default": "YS",
"type": {
"type": "enum",
"symbols": ["YS", "MS", "QS-DEC", "AS-JUL"]
}
}
},
"outputs": {
"output": {
"type": "File",
"outputSource": "ice_days/output_netcdf"
}
},
"steps": {
"subset": {
"run": "ColibriFlyingpigeon_SubsetBbox.cwl",
"in": {
"resource": "tasmax",
"lat0": "lat0",
"lat1": "lat1",
"lon0": "lon0",
"lon1": "lon1"
},
"out": ["output"]
},
"json2nc": {
"run": "jsonarray2netcdf",
"in": {
"input": "subset/output"
},
"out": ["output"]
},
"ice_days": {
"run": "Finch_IceDays.cwl",
"in": {
"tasmax": "json2nc/output",
"freq": "freq"
},
"out": ["output_netcdf"]
}
}
}
|
For instance, the jsonarray2netcdf
(Builtin) middle step in this example corresponds to the
CWL CommandLineTool process presented in previous section. Other processes referenced in this Workflow
can be
found in Weaver Test Resources.
Steps processes names are resolved using the variations presented below. Important care also needs to be given to inputs and outputs definitions between each step.
Step Reference¶
In order to resolve referenced processes as steps, Weaver supports 3 formats.
- Process ID explicitly given.Any visible process from
GET {WEAVER_URL}/processes
(GetCapabilities) response should be resolved this way.(e.g.:jsonarray2netcdf
resolves to pre-deployedweaver.processes.builtin.jsonarray2netcdf
). Full URL to the process description endpoint, provided that it also offers a
GET {WEAVER_URL}/processes/{id}/package
(Package) endpoint (Weaver-specific).Full URL to the explicit CWL file (usually corresponding to (2) or the
href
provided in deployment body).
When an URL to the CWL process “file” is provided with an extension, it must be one of the supported values defined
in weaver.processes.wps_package.PACKAGE_EXTENSIONS
. Otherwise, Weaver will refuse it as it cannot figure
out how to parse it.
Because Weaver and the underlying CWL executor need to resolve all steps in order to validate their input and
output definitions correspond (id, format, type, etc.) in order to chain them, all intermediate processes MUST
be available. This means that you cannot Register a new process (Deploy) nor Execution of a process (Execute) a Workflow
-flavored Application Package
until all referenced steps have themselves been deployed and made visible.
Warning
Because Weaver needs to convert given CWL documents into equivalent WPS process definition, embedded CWL
processes within a Workflow
step are not supported currently. This is a known limitation of the implementation,
but not much can be done against it without major modifications to the code base.
See also issue #56.
See also
Step Inputs/Outputs¶
Inputs and outputs of connected steps are required to match types and formats in order for the workflow to be valid.
This means that a process that produces an output of type String
cannot be directly chained to a process that takes
as input a File
, even if the String
of the first process represents an URL that could be resolved to a valid
file reference. In order to chain two such processes, an intermediate operation would need to be defined to explicitly
convert the String
input to the corresponding File
output. This is usually accomplished using Builtin
processes, such as in the previous example.
Since formats must also match (e.g.: a process producing application/json
cannot be mapped to one producing
application/x-netcdf
), all mismatching formats must also be converted with an intermediate step if such operation
is desired. This ensures that workflow definitions are always explicit and that as little interpretation, variation or
assumptions are possible between each execution. Because of this, all application generated by Weaver will attempt to
preserve and enforce matching input/output format
definition in both CWL and WPS as long as it does not
introduce ambiguous results (see File Format for more details).
Correspondance between CWL and WPS fields¶
Because CWL definition and WPS process description inherently provide “duplicate” information, many fields can be mapped between one another. In order to handle any provided metadata in the various supported locations by both specifications, as well as to extend details of deployed processes, each Application Package get its details merged with complementary WPS description.
In some cases, complementary details are only documentation-related, but some information directly affect the format or
execution behaviour of some parameters. A common example is the maxOccurs
field provided by WPS that does not
have an exactly corresponding specification in CWL (any-sized array). On the other hand, CWL also provides data
preparation steps such as initial staging (i.e.: InitialWorkDirRequirement
) that doesn’t have an equivalent under
the WPS process description. For this reason, complementary details are merged and reflected on both sides
(as applicable), when non-ambiguous resolution is possible.
In case of conflicting metadata, the CWL specification will most of the time prevail over the WPS metadata fields
simply because it is expected that a strict CWL specification is provided upon deployment. The only exceptions to this
situation are when WPS specification help resolve some ambiguity or when WPS reinforce the parametrisation of some
elements, such as with maxOccurs
field.
Note
Metadata merge operation between CWL and WPS is accomplished on per-mapped-field basis. In other words, more
explicit details such as maxOccurs
could be obtained from WPS and simultaneously the same input’s
format
could be obtained from the CWL side. Merge occurs bidirectionally for corresponding information.
The merging strategy of process specifications also implies that some details can be omitted from one context if they
can be inferred from corresponding elements in the other. For example, the CWL and WPS context both define
keywords
(with minor naming variation) as a list of strings. Specifying this metadata in both locations is redundant
and only makes the process description longer. Therefore, the user is allowed to provide only one of the two and
Weaver will take care to propagate the information to the lacking location.
In order to help understand the resolution methodology between the contexts, following sub-section will cover supported mapping between the two specifications, and more specifically, how each field impacts the mapped equivalent metadata.
Warning
Merging of corresponding fields between CWL and WPS is a Weaver-specific implementation. The same behaviour is not necessarily supported by other implementations. For this reason, any converted information between the two contexts will be transferred to the other context if missing in order for both specification to reflect the similar details as closely as possible, wherever context the metadata originated from.
Inputs/Outputs ID¶
Inputs and outputs (I/O) id
from the CWL context will be respectively matched against corresponding id
or
identifier
field from I/O of WPS context. In the CWL definition, all of the allowed I/O structures are
supported, whether they are specified using an array list with explicit definitions, using “shortcut” variant, or using
key-value pairs (see CWL Mapping for more details). Regardless of array or mapping format, CWL
requires that all I/O have unique id
. On the WPS side, a list of I/O is always expected. This is because
WPS I/O with multiple values (array in CWL) are specified by repeating the id
with each value instead of
defining the value as a list of those values during Execution of a process (Execute) request (see also Multiple Inputs).
To summarize, the following CWL and WPS I/O definitions are all equivalent and will result into the same process
definition after deployment. For simplification purpose, below examples omit all but mandatory fields (only of the
inputs
and outputs
portion of the full deployment body) to produce the same result.
Other fields are discussed afterward in specific sections.
|
|
|
The WPS example above requires a format
field for the corresponding CWL File
type in order to distinguish
it from a plain string. More details are available in Inputs/Outputs Type below about this requirement.
Finally, it is to be noted that above CWL and WPS definitions can be specified in the Register a new process (Deploy) request body with any of the following variations:
Both are simultaneously fully specified (valid although extremely verbose).
Both partially specified as long as sufficient complementary information is provided.
Only CWL I/O is fully provided (with empty or even unspecified
inputs
oroutputs
section from WPS).
Warning
Weaver assumes that its main purpose is to eventually execute an Application Package and will therefore
prioritize specification in CWL over WPS. Because of this, any unmatched id
from the WPS context against
provided CWL id
s of the same I/O section will be dropped, as they ultimately would have no purpose during
CWL execution.
This does not apply in the case of referenced WPS-1/2 processes since no CWL is available in the first place.
Inputs/Outputs Type¶
In the CWL context, the type
field indicates the type of I/O. Available types are presented in the
CWLType Symbols portion of the specification.
Warning
Weaver has two unsupported CWL type
, namely Any
and Directory
. This limitation is intentional
as WPS does not offer equivalents. Furthermore, both of these types make the process description too ambiguous.
For instance, most processes expect remote file references, and providing a Directory
doesn’t indicate an
explicit reference to which files to retrieve during stage-in operation of a job execution.
In the WPS context, three data types exist, namely Literal
, BoundingBox
and Complex
data.
As presented in the example of the previous section, I/O in the WPS context does not require an explicit indication
of the type from one of Literal
, BoundingBox
and Complex
data. Instead, WPS type is inferred using the
matched API schema of the I/O. For instance, Complex
I/O (i.e.: file reference) requires the formats
field to
distinguish it from a plain string
. Therefore, specifying either format
in CWL or formats
in WPS
immediately provides all needed information for Weaver to understand that this I/O is expected to be a file reference.
A crs
field would otherwise indicate a BoundingBox
I/O (see note). If none of the two
previous schemas are matched, the I/O type resolution falls back to Literal
data of string
type. To employ
another primitive data type such as Integer
, an explicit indication needs to be provided as follows.
1 2 3 4 5 6 | {
"id": "input",
"literalDataDomains": [
{"dataType": {"name": "integer"}}
]
}
|
Obviously, the equivalent CWL definition is simpler in this case (i.e.: only type: int
required). It is therefore
recommended to take advantage of Weaver’s merging strategy in this case by providing only the details through the
CWL definition and have the corresponding WPS I/O type automatically deduced by the generated process.
Note
As of the current version of Weaver, WPS data type BoundingBox
is not supported. The schema definition
exists in WPS context but is not handled by any CWL type conversion yet. This feature is reflected
by issue #51. It is possible to use a Literal
data of
type string
corresponding to WKT 1, 2 in the meantime.
File Format¶
An input or output resolved as CWL File
type, equivalent to a WSP ComplexData
, supports format
specification. Every mimeType
field nested under formats
entries of the WPS definition will be mapped against
corresponding namespaced format
of CWL.
For example, the following input definitions are equivalent in both contexts.
|
|
As demonstrated, both contexts accept multiple formats for inputs. These effectively represent supported formats by
the underlying application. The two MIME-types selected for this example are chosen specifically to demonstrate how
CWL formats must be specified. More precisely, CWL requires a real schema definition referencing to an existing
ontology to validate formats, specified through the $namespaces
section. Each format entry is then defined as a
mapping of the appropriate namespace to the identifier of the ontology. Alternatively, you can also provide the full
URL of the ontology reference in the format string.
Like many other fields, this information can become quite rapidly redundant and difficult to maintain. For this reason,
Weaver will automatically fill the missing detail if only one of the two corresponding information between CWL and
WPS is provided. In other words, an application developer could only specify the I/O’s formats
in the WPS
portion during process deployment, and Weaver will take care to update the matching CWL definition without any user
intervention. This makes it also easier for the user to specify supported formats since it is generally easier to
remember MIME-type names than ontology references. Weaver has a large set of common MIME-types that it knows how to
convert to corresponding ontologies. Also, Weaver will look for any new MIME-type it doesn’t explicitly know about
onto the IANA ontology in order to attempt automatically resolving it.
When formats are resolved between the two contexts, Weaver applies information in a complimentary fashion. This means
for example that if the user provided application/x-netcdf
on the WPS side and iana:application/json
on the
CWL side, both resulting contexts will have both of those formats combined. Weaver will not favour one location over
the other, but will rather merge them if they can be resolved into different and valid entities.
Since format
is a required field for WPS ComplexData
definitions (see Inputs/Outputs Type) and that
MIME-types are easier to provide in this context, it is recommended to provide all of them in the WPS definition.
Output File Format¶
Warning
Format specification differs between CWL and WPS in the case of outputs.
Although WPS definition allows multiple supported formats for output that are later resolved to the applied one
onto the produced result of the job, CWL only considers the output format
that directly indicates the applied
schema. There is no concept of supported format in the CWL world. This is simply because CWL cannot predict nor
reliably determine which output will be produced by a given application execution without running it, and therefore
cannot expose consistent output specification before running the process. Because CWL requires to validate the full
process integrity before it can be executed, this means that only a single output format is permitted in its context
(providing many will raise a validation error when parsing the CWL definition).
To ensure compatibility with multiple supported formats outputs of WPS, any output that has more that one format
will have its format
field dropped in the corresponding CWL definition. Without any format
on the CWL side,
the validation process will ignore this specification and will effectively accept any type of file. This will not break
any execution operation with CWL, but it will remove the additional validation layer of the format (which especially
deteriorates process resolution when chaining processes inside a CWL Workflow).
If the WPS output only specifies a single MIME-type, then the equivalent format (after being resolved to a valid ontology) will be preserved on the CWL side since the result is ensured to be the unique one provided. For this reason, processes with specific single-format output are be preferred whenever possible. This also removes ambiguity in the expected output format, which usually requires a toggle input specifying the desired type for processes providing a multi-format output. It is instead recommended to produce multiple processes with a fixed output format for each case.
Allowed Values¶
Allowed values in the context of WPS LiteralData
provides a mean for the application developer to restrict inputs
to a specific set of values. In CWL, the same can be achieved using an enum
definition. Therefore, the following
two variants are equivalent and completely interchangeable.
|
|
Weaver will ensure to propagate such definitions bidirectionally in order to update the CWL or WPS
correspondingly with the provided information in the other context if missing. The primitive type to apply to a missing
WPS specification when resolving it from a CWL definition is automatically inferred with the best matching type
from provided values in the enum
list.
Note that enum
such as these will also be applied on top of Multiple and Optional Values definitions
presented next.
Multiple and Optional Values¶
Inputs that take multiple values or references can be specified using minOccurs
and maxOccurs
in WPS
context, while they are specified using the array
type in CWL. While the same minOccurs
parameter with a
value of zero (0) can be employed to indicate an optional input, CWL requires the type to specify null
or to
use the shortcut ?
character suffixed to the base type to indicate optional input. Resolution between WPS and
CWL for the merging strategy implies all corresponding parameter combinations and checks in this case.
Because CWL does not take an explicit amount of maximum occurrences, information in this case are not necessarily
completely interchangeable. In fact, WPS is slightly more verbose and easier to define in this case than CWL
because all details are contained within the same two parameters. Because of this, it is often preferable to provide
the minOccurs
and maxOccurs
in the WSP context, and let Weaver infer the array
and/or null
type
requirements automatically. Also, because of all implied parameters in this situation to specify the similar details,
it is important to avoid providing contradicting specifications as Weaver will have trouble guessing the intended
result when merging specifications. If unambiguous guess can be made, CWL will be employed as deciding definition to
resolve erroneous mismatches (as for any other corresponding fields).
Todo
update warning according to Weaver issue #25
Warning
Parameters minOccurs
and maxOccurs
are not permitted for outputs in the WPS context. Native WPS
therefore does not permit multiple output reference files. This can be worked around using a Metalink file,
but this use case is not covered by Weaver yet as it requires special mapping with CWL that does support
array
type as output (see issue #25).
Note
Although WPS multi-value inputs are defined as a single entity during deployment, special care must be taken to the format in which to specify these values during execution. Please refer to Multiple Inputs section of Execution of a process (Execute) request.
Following are a few examples of equivalent WPS and CWL definitions to represent multiple values under a given input. Some parts of the following definitions are purposely omitted to better highlight the concise details of multiple and optional information.
|
|
Todo
minOccurs/maxOccurs + array + WPS repeats IDs vs CWL as list
Todo
example multi-value + enum
It can be noted from the examples that minOccurs
and maxOccurs
can be either an integer
or a string
representing one. This is to support backward compatibility of older WPS specification that always employed strings
although representing numbers. Weaver understands and handles both cases. Also, maxOccurs
can have the special
value "unbounded"
, in which case the input is considered to be allowed an unlimited amount if entries (although
often capped by another implicit machine-level limitation such as memory capacity). In the case of CWL, an array
is always considered as unbounded, therefore WPS is the only context that can limit this amount.
Metadata¶
Todo
(s:)keywords field, doc/label vs abstract/title per-I/O and overall process, etc?
Example: cwl-metadata