BigMultiPipe¶
- class bigmultipipe.BigMultiPipe(num_processes=None, mem_available=None, mem_frac=0.8, process_size=None, outdir=None, create_outdir=False, outname_append='_bmp', pre_process_list=None, post_process_list=None, PoolClass=None, **kwargs)[source]¶
Bases:
objectBase class for memory- and processing power-optimized pipelines
- Parameters
- outdir, create_outdir, outname_append, outname: see
- num_processes, mem_available, mem_frac, process_sizeoptional
These parameters tune computer processing and memory resources and are used when the
pipeline()method is executed. See documentation fornum_can_process()for use, noting that thenum_to_processargument of that function is set to the number of input filenames inpipeline()- pre_process_listlist
See
pre_process_listattribute- post_process_listlist
See
post_process_listattribute- PoolClassclass name or None, optional
Typcally a subclass of
multiprocessing.pool.Pool. Themap()method of this class implements the multiprocessing feature of this module. IfNone,multiprocessing.pool.Poolis used. Default isNone.- **kwargsoptional
Python’s
**kwargsconstruct stores additional keyword arguments as adictaccessed in the function askwargs. In order to implement the control stream discussed in the introduction to this module, thisdictis captured as property on instantiation. When any methods are run, thekwargspassed to that method are merged with the propertykwargsusingkwargs_merge(). This allows the parameters passed to the methods at runtime to override the parameters passed to the object at instantiation time.
Notes
Just like
**kwargs, all named parameters passed at object instantiation are stored as property and used to initialize the identical list of parameters to theBigMultiPipe.pipeline()method. Any of these parameters exceptpre_process_listandpost_process_listcan be overridden whenpipeline()is called by using the corresponding keyword. This enables definition of a default pipeline configuration when the object is instantiated that can be modified at run-time. The exception to this arepre_process_listandpost_process_list. When these are provided topipeline(), they are added to corresponding lists provided at instantiation time. To erase these lists in the object simply set their property to None: e.g.BigMultipipe.pre_process_list= NoneAttributes Summary
list or None : List of post-processing routines
list or None : List of pre-processing routines
Methods Summary
data_process(data, **kwargs)Process the data.
data_process_meta_create(data, **kwargs)Process data and create metadata
file_process(in_name, **kwargs)Process one file in the
bigmultipipesystemfile_read(in_name, **kwargs)Reads data file(s) from disk.
file_write(data, outname[, meta, create_outdir])Create outdir of create_outdir is True.
kwargs_merge(**kwargs)Merge **kwargs with **kwargs provided on instantiation
outname_create(*args, **kwargs)Create output filename (including path) using
bigmultipipe.outname_creatorpipeline(in_names[, num_processes, ...])Runs pipeline, maximizing processing and memory resources
post_process(data[, post_process_list, ...])Conduct post-processing tasks, including creation of metadata
pre_process(data[, pre_process_list])Conduct pre-processing tasks
Attributes Documentation
- post_process_list¶
list or None : List of post-processing routines
List of functions called by
post_process()after primary processing step. Indended to enable additional processing steps and produce metadata as discussed in Discussion of Design. Each function must accept one positional parameter,data, one optional keyword parameter,bmp_meta, any keywords needed by the function, and an arbitrary list of keywords handled by the**kwargsfeature.bmp_metais of typedict. The return value of each function is intended to be the data but it not restricted in any way. IfNoneis return, processing stops for that file,Noneis returned for that file’s data and the metadata accumulated to that point is returned as that file’s metadata.bmp_meta.clear()can be used in the terminatingpost_process_listroutine if it is desirable to erase the metadata. See Example for examples of a simple functioning pipeline.>>> def later_booster(data, need_to_boost_by=None, **kwargs): >>> if need_to_boost_by is not None: >>> data = data + need_to_boost_by >>> return data >>> >>> def median(data, bmp_meta=None, **kwargs): >>> m = np.median(data) >>> if bmp_meta is not None: >>> bmp_meta['median'] = m >>> return data >>>
- pre_process_list¶
list or None : List of pre-processing routines
List of functions called by
pre_process()before primary processing step. Intended to implement filtering and control features as described in Discussion of Design. Each function must accept one positional parameter,data, keyword arguments necessary for its internal functioning, and**kwargsto ignore keyword parameters not processed by the function. If the return value of the function isNone, processing of that file stops, no output file is written, andNoneis returned instead of an output filename. This is how filtering is implemented. Otherwise, the return value is eitherdataor adictwith two keys:bmp_dataandbmp_kwargs. In the later case,bmp_kwargswill be merged into**kwargs. This is how the control channel is implemented. Below are examples. See Example to see this code in use in a functioning pipeline.>>> def reject(data, reject_value=None, **kwargs): >>> if reject_value is None: >>> return data >>> if data[0,0] == reject_value: >>> # --> Return data=None to reject data >>> return None >>> return data >>> >>> def boost_later(data, boost_target=None, boost_amount=None, **kwargs): >>> if boost_target is None or boost_amount is None: >>> return data >>> if data[0,0] == boost_target: >>> add_kwargs = {'need_to_boost_by': boost_amount} >>> retval = {'bmp_data': data, >>> 'bmp_kwargs': add_kwargs} >>> return retval >>> return data
Methods Documentation
- data_process(data, **kwargs)[source]¶
Process the data. Intended to be overridden in subclass
- Parameters
- dataany type
Data to be processed
- Returns
- dataany type
Processed data
- data_process_meta_create(data, **kwargs)[source]¶
Process data and create metadata
- Parameters
- dataany type
Data to be processed by
pre_process(),data_process(), andpost_process()- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- (data, meta)tuple
Data is the processed data. Meta is created by
post_process()
- file_process(in_name, **kwargs)[source]¶
Process one file in the
bigmultipipesystemThis method can be overridden to interface with applications where the primary processing routine already reads the input data from disk and writes the output data to disk,
- Parameters
- in_name: str
Name of file to process. Data from the file will be read by
file_read()and processed bydata_process_meta_create(). Output filename will be created byoutname_create()and data will be written byfile_write()- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- (outname, meta)tuple
Outname is the name of file to which processed data was written. Meta is the dictionary element of the tuple returned by
data_process_meta_create()
- file_read(in_name, **kwargs)[source]¶
Reads data file(s) from disk. Intended to be overridden by subclass
- Parameters
- in_namestr or list
If
str, name of file to read. Iflist, each element in list is processed recursively so that multiple files can be considered a single “data” inbigmultipipenomenclature- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- dataany type
Data to be processed
- file_write(data, outname, meta=None, create_outdir=False, **kwargs)[source]¶
Create outdir of create_outdir is True. MUST be overridden by subclass to actually write the data, in which case, this would be an example first line of the subclass.file_write method:
BigMultiPipe(self).file_write(data, outname, **kwargs)
- Parameters
- dataany type
Processed data
- outnamestr
Name of file to write
- metadict
BigMultiPipemetadata dictionary- create_outdirbool, optional
If
True, create outdir and any needed parent directories. Does not raise an error if outdir already exists. Overwritten bycreate_outdirkey inmeta. Default isFalse- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- outnamestr
Name of file written
- kwargs_merge(**kwargs)[source]¶
Merge **kwargs with **kwargs provided on instantiation
Intended to be called by methods
- Parameters
- **kwargskeyword arguments
- outname_create(*args, **kwargs)[source]¶
Create output filename (including path) using
bigmultipipe.outname_creator- Returns
- outnamestr
output filename to be written, including path
- pipeline(in_names, num_processes=None, mem_available=None, mem_frac=None, process_size=None, PoolClass=None, outdir=None, create_outdir=None, outname_append=None, outname=None, **kwargs)[source]¶
Runs pipeline, maximizing processing and memory resources
- Parameters
- in_names
listofstr List of input filenames. Each file is processed using
file_process()- All other parameterssee Parameters to
BigMultiPipe
- in_names
- Returns
- pout
listof tuples(outname, meta), onetuplefor each
in_name.OutnameisstrorNone,metais adictcontaining metadata output. Ifoutnameisstr, it is the name of the file to which the processed data were written. IfNoneand meta = {}, the convenience functionprune_pout()can be used to remove this tuple frompoutand the corresponding in_name from the in_names list.
- pout
- post_process(data, post_process_list=None, no_bmp_cleanup=False, **kwargs)[source]¶
Conduct post-processing tasks, including creation of metadata
This method can be overridden to permanently insert post-processing tasks in the pipeline for each instantiated object or the post_process_list feature can be used for a more dynamic approach to inserting post-processing tasks at object instantiation and/or when the pipeline is run
- Parameters
- dataany type
Data to be processed by the functions in pre_process_list
- post_process_listlist
See documentation for this parameter in Parameters section of
BigMultiPipe- no_bmp_cleanupbool
Do not run
bmp_cleanuppost-processing task even if keywords have been added to the bmp_cleanup_list (provided for debugging purposes). Default isFalse- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- (data, meta)tuple
Data are the post-processed data. Meta are the combined meta dicts from all of the post_process_list functions.
- pre_process(data, pre_process_list=None, **kwargs)[source]¶
Conduct pre-processing tasks
This method can be overridden to permanently insert pre-processing tasks in the pipeline for each instantiated object and/or the pre_process_list feature can be used for a more dynamic approach to inserting pre-processing tasks at object instantiation and/or when the pipeline is run
- Parameters
- dataany type
Data to be processed by the functions in pre_process_list
- pre_process_listlist
See documentation for this parameter in Parameters section of
BigMultiPipe- kwargssee Notes in
BigMultiPipeParameter section
- Returns
- (data, kwargs)tuple
Data are the pre-processed data. Kwargs are the combined kwarg outputs from all of the pre_process_list functions.