pyphoon.io

Modules

module Description
pyphoon.io.h5 Reading and writing operations on H5 files.
pyphoon.io.tsv Reading and writing operations
pyphoon.io.utils Generic tools

pyphoon.io.h5

pyphoon.io.h5.get_h5_filenames(directory)

Obtains the list of H5 file names within the given directory. If the specified directory is a H5 file itself, then the name of the file is returned.

Parameters:directory (str) – Path to a folder containing HDF5 files or a single HDF5 file.
Returns:List with the paths to all H5 files available according to the specified directory. List is empty if no file was found.
Return type:list
pyphoon.io.h5.read_source_images(path_to_folder)

Reads all image files within a given folder. Note that all images are assumed to have the same dimensionality. In addition, an image should have been stored as a dataset, with name ‘infrared’, in an HDF5 file.

Parameters:path_to_folder (str) – Complete path to the folder containing HDF image files.
Returns:NxWxH Numpy array (N: #images, W: image width, H: image height)
Return type:list
pyphoon.io.h5.read_source_image(path_to_file)

Reads an image from an HDF5 file. It assumes that the image was stored as a dataset with name ‘infrared’ in an HDF5 file.

Parameters:path_to_file (str) – Path to the HDF file storing the image.
Returns:Image of size WxH (W: image width, H: image height)
Return type:numpy.array
pyphoon.io.h5.write_image(path_to_file, image, compression='gzip')

Stores a given image in a dataset in a HDF5 file.

Parameters:
  • compression (str) – Compression type
  • path_to_file (str) – Path to the HDF file storing the image.
  • image (numpy.array) – Image information.
pyphoon.io.h5.read_h5groupfile(path_to_file)

Reads an H5 file and returns its content in a dictionary-fashion. Note that the H5 file is assumed to have a set of groups with two datasets (‘data’ and ‘ids’). The groups refer to the different data fields used as source data for Digital Typhoon.

Parameters:path_to_file (str) – Path to an H5 file.
Returns:Content of the HDF5 file as a dictionary. Keys stand for data field names and corresponding values are dictionaries with two fields:
  • data: Contains the data itself
  • ids: Contains the ids associated to the samples from data.

As a consequence, the format of the returned file is a 2-nested dictionary.

Return type:dict
pyphoon.io.h5.write_h5groupfile(data, path_to_file, compression)

Constructs and stores an H5 file containing the given data.

Parameters:
  • data (dict) –

    Dictionary containing the data to be stored. Keys stand for data field names and corresponding values are dictionaries with two fields:

    • data: Contains the data itself.
    • ids: Contains the ids associated to the samples from data.

    Hence, data is a 2-nested dictionary.

  • path_to_file (str) – Path where the new H5 file will be created.
  • compression – Use to compress H5 file. Find more details at the h5py documentation.
pyphoon.io.h5.read_h5_dataset_file(path_to_file)

Reads an HDF5 file and returns its content in a dictionary-fashion.

Parameters:path_to_file (str) – Path to an H5 file.
Returns:Content of the HDF5 file as a dictionary. Keys stand for data field names and values are the corresponding data.
Return type:dict
pyphoon.io.h5.write_h5_dataset_file(data, path_to_file, compression)

Constructs and stores an HDF5 file containing the given data.

Parameters:
  • data (dict) – Dictionary containing the data to be stored. Keys stand for data field names, values are the corresponding data.
  • path_to_file (str) – Path where the new H5 file will be created.
  • compression – Use to compress H5 file. Find more details at the h5py documentation

pyphoon.io.tsv

pyphoon.io.tsv.read_tsvs(path_best)

Reads all the files from the jma directory and returns a list of N elements, each being a list of typhoon features. To this end, it assumes that path_best contains all .TSV JMA data files.

Parameters:path_best (str) – Path to the directory containing the JMA .TSV data files.
Returns:List with all best data from all samples in the .TSV files.
Return type:list
pyphoon.io.tsv.read_tsv(path_to_file)

Retrieves the data from a .TSV JMA data file.

Parameters:path_to_file (str) – Complete path to the TSV file
Returns:List with JMA data extracted from given .TSV file. The length of the list is equal to the number of samples and each element in the list is a list with length equal to number of features.
Return type:list
pyphoon.io.tsv.check_constant_distance_in_tsv(path_best, time_distance=3600)

Checks that within a typhoon sequence the time distance between consecutive image frames remains constant.

Parameters:
  • path_best (str) – Directory containing TSV files.
  • time_distance (int) – Distance between frames in seconds.
Returns:

List providing, per each sequence (tsv file), the number of time-gaps greater than time_distance without a satellite image. Element n:th in the list refers to the n:th typhoon sequence.

Return type:

list


pyphoon.io.utils

Some tools ot assist in reading source data.

pyphoon.io.utils.get_image_ids(sequence_folder)

Gets ids of all image HDF5 files in sequence_folder. To do the conversion filename to id it makes use of imagefilename2id().

Parameters:sequence_folder (str) – Path to the folder containing images stored as single HDF5 files with the original naming convention.
Returns:List with the ids of all images within the folder sequence_folder.
Return type:list
pyphoon.io.utils.get_image_dates(sequence_folder)

Gets the dates from all image HDF5 files stored in sequence_folder. To do the conversion filename to id it makes use of imagefilename2date().

Parameters:sequence_folder (str) – Path to the folder containing images stored as single HDF5 files with the original naming convention.
Returns:List with the dates of all images within the folder sequence_folder.
Return type:list
pyphoon.io.utils.get_best_ids(best_data, seq_no)

Gets ids for each sample in the best track data. It obtains the date from each sample using get_best_date() and converts it to a typhoon id

using the sequence number seq_no and method date2id().
Parameters:
  • best_data (numpy.array) – Array containing the Best Track data.
  • seq_no (str) – Name of the typhoon sequence
Returns:

List with the ids of all samples from input Best Track data.

Return type:

list

pyphoon.io.utils.get_best_dates(best_data)

Gets the dates for each sample in the best track data. To extract the date from the filename it uses get_best_date().

Parameters:best_data (numpy.array) – Array containing the data from Best Track.
Returns:List of datetime.datetime elements.
Return type:list
pyphoon.io.utils.get_best_date(best_data_sample)

Get the date of best data sample best_data_sample. To this end, it uses the date features of the sample, namely the year, month, day and hour features.

Parameters:best_data_sample
Returns:
pyphoon.io.utils.id2date(identifier)

Gets the date of a typhoon image frame with id given by identifier. A typical id is in the format <seq_no>_<YYYYMMDD>, where seq_no denotes the sequence number (e.g. 199801).

Parameters:identifier (str) – Identifier of an image or best track frame.
Returns:Date of the frame
Return type:datetime.datetime
pyphoon.io.utils.id2seqno(identifier)

Gets sequence number from a typhoon id. E.g, 199802_199980101 -> 199802

Parameters:identifier (str) – Typhoon unique identifier, e.g. 199802_199980101.
Returns:Sequence number.
Return type:str
pyphoon.io.utils.date2id(date, seq_no)

Generates the id of an image frame sample using its date and the id of the typhoon sequence it belongs to.

Parameters:
  • date (datetime.datetime) – Date of the sample
  • seq_no (str) – Typhoon sequence number, e.g. “199607”.
Returns:

Id of the sample corresponding to the given sequence name and date.

Return type:

str

pyphoon.io.utils.hdffile2name(path_h5file)

Given a path to an HDF5 file it obtains the file name (without format extension). E.g, file.h5 -> file

Parameters:path_h5file (str) – Path to an HDF file.
Returns:Name of file without format extension.
Return type:str
pyphoon.io.utils.folder2name(path_folder)

Given a path to a folder it obtains the name of the folder alone. E.g. /path/to/some/folder -> folder

Parameters:path_folder (str) – Path to a folder.
Returns:Name of folder.
Return type:str
pyphoon.io.utils.imagefilename2id(filename)

Gets the id of an image sample. The id is generated using two main components. - The date of the typhoon sample. - Typhoon sequence number.

Note that typhoons from different sequences might have the same ID since they were recorded at the same time. Therefore, the final id is constructed using both the date and the typhoon sequence number together.

To build the image id, the name of the original HDF file is used, which have the following structure:

YYYYMMDDHH-<typhoon id>-<satellite model>.h5

We can then parse it to the id, namely:

<typhoon id>_YYYYMMDDHH.

Parameters:filename (str) – HDF image filename.
Returns:Image frame identifier.
Return type:int
pyphoon.io.utils.imagefilename2date(filename)

Extracts the date from a file with a specific filename. To obtain the image date from the filename, the filename must have the following structure:

YYYYMMDDHH-<typhoon id>-<satellitemodel>.h5.

Parameters:filename (str) – Name of the HDF image file.
Returns:Date the image with a given filename was taken.
Return type:datetime.datetime