pyphoon.io¶
Modules¶
module | Description |
---|---|
pyphoon.io.h5 |
Reading and writing operations on H5 files. |
pyphoon.io.tsv |
Reading and writing operations |
pyphoon.io.utils |
Generic tools |
pyphoon.io.h5¶
-
pyphoon.io.h5.
get_h5_filenames
(directory)¶ Obtains the list of H5 file names within the given directory. If the specified directory is a H5 file itself, then the name of the file is returned.
Parameters: directory (str) – Path to a folder containing HDF5 files or a single HDF5 file. Returns: List with the paths to all H5 files available according to the specified directory. List is empty if no file was found. Return type: list
-
pyphoon.io.h5.
read_source_images
(path_to_folder)¶ Reads all image files within a given folder. Note that all images are assumed to have the same dimensionality. In addition, an image should have been stored as a dataset, with name ‘infrared’, in an HDF5 file.
Parameters: path_to_folder (str) – Complete path to the folder containing HDF image files. Returns: NxWxH Numpy array (N: #images, W: image width, H: image height) Return type: list
-
pyphoon.io.h5.
read_source_image
(path_to_file)¶ Reads an image from an HDF5 file. It assumes that the image was stored as a dataset with name ‘infrared’ in an HDF5 file.
Parameters: path_to_file (str) – Path to the HDF file storing the image. Returns: Image of size WxH (W: image width, H: image height) Return type: numpy.array
-
pyphoon.io.h5.
write_image
(path_to_file, image, compression='gzip')¶ Stores a given image in a dataset in a HDF5 file.
Parameters: - compression (str) – Compression type
- path_to_file (str) – Path to the HDF file storing the image.
- image (numpy.array) – Image information.
-
pyphoon.io.h5.
read_h5groupfile
(path_to_file)¶ Reads an H5 file and returns its content in a dictionary-fashion. Note that the H5 file is assumed to have a set of groups with two datasets (‘data’ and ‘ids’). The groups refer to the different data fields used as source data for Digital Typhoon.
Parameters: path_to_file (str) – Path to an H5 file. Returns: Content of the HDF5 file as a dictionary. Keys stand for data field names and corresponding values are dictionaries with two fields: - data: Contains the data itself
- ids: Contains the ids associated to the samples from data.
As a consequence, the format of the returned file is a 2-nested dictionary.
Return type: dict
-
pyphoon.io.h5.
write_h5groupfile
(data, path_to_file, compression)¶ Constructs and stores an H5 file containing the given data.
Parameters: - data (dict) –
Dictionary containing the data to be stored. Keys stand for data field names and corresponding values are dictionaries with two fields:
- data: Contains the data itself.
- ids: Contains the ids associated to the samples from data.
Hence, data is a 2-nested dictionary.
- path_to_file (str) – Path where the new H5 file will be created.
- compression – Use to compress H5 file. Find more details at the h5py documentation.
- data (dict) –
-
pyphoon.io.h5.
read_h5_dataset_file
(path_to_file)¶ Reads an HDF5 file and returns its content in a dictionary-fashion.
Parameters: path_to_file (str) – Path to an H5 file. Returns: Content of the HDF5 file as a dictionary. Keys stand for data field names and values are the corresponding data. Return type: dict
-
pyphoon.io.h5.
write_h5_dataset_file
(data, path_to_file, compression)¶ Constructs and stores an HDF5 file containing the given data.
Parameters: - data (dict) – Dictionary containing the data to be stored. Keys stand for data field names, values are the corresponding data.
- path_to_file (str) – Path where the new H5 file will be created.
- compression – Use to compress H5 file. Find more details at the h5py documentation
pyphoon.io.tsv¶
-
pyphoon.io.tsv.
read_tsvs
(path_best)¶ Reads all the files from the jma directory and returns a list of N elements, each being a list of typhoon features. To this end, it assumes that path_best contains all .TSV JMA data files.
Parameters: path_best (str) – Path to the directory containing the JMA .TSV data files. Returns: List with all best data from all samples in the .TSV files. Return type: list
-
pyphoon.io.tsv.
read_tsv
(path_to_file)¶ Retrieves the data from a .TSV JMA data file.
Parameters: path_to_file (str) – Complete path to the TSV file Returns: List with JMA data extracted from given .TSV file. The length of the list is equal to the number of samples and each element in the list is a list with length equal to number of features. Return type: list
-
pyphoon.io.tsv.
check_constant_distance_in_tsv
(path_best, time_distance=3600)¶ Checks that within a typhoon sequence the time distance between consecutive image frames remains constant.
Parameters: - path_best (str) – Directory containing TSV files.
- time_distance (int) – Distance between frames in seconds.
Returns: List providing, per each sequence (tsv file), the number of time-gaps greater than time_distance without a satellite image. Element n:th in the list refers to the n:th typhoon sequence.
Return type: list
pyphoon.io.utils¶
Some tools ot assist in reading source data.
-
pyphoon.io.utils.
get_image_ids
(sequence_folder)¶ Gets ids of all image HDF5 files in sequence_folder. To do the conversion filename to id it makes use of
imagefilename2id()
.Parameters: sequence_folder (str) – Path to the folder containing images stored as single HDF5 files with the original naming convention. Returns: List with the ids of all images within the folder sequence_folder. Return type: list
-
pyphoon.io.utils.
get_image_dates
(sequence_folder)¶ Gets the dates from all image HDF5 files stored in sequence_folder. To do the conversion filename to id it makes use of
imagefilename2date()
.Parameters: sequence_folder (str) – Path to the folder containing images stored as single HDF5 files with the original naming convention. Returns: List with the dates of all images within the folder sequence_folder. Return type: list
-
pyphoon.io.utils.
get_best_ids
(best_data, seq_no)¶ Gets ids for each sample in the best track data. It obtains the date from each sample using
get_best_date()
and converts it to a typhoon idusing the sequence number seq_no and methoddate2id()
.Parameters: - best_data (numpy.array) – Array containing the Best Track data.
- seq_no (str) – Name of the typhoon sequence
Returns: List with the ids of all samples from input Best Track data.
Return type: list
-
pyphoon.io.utils.
get_best_dates
(best_data)¶ Gets the dates for each sample in the best track data. To extract the date from the filename it uses
get_best_date()
.Parameters: best_data (numpy.array) – Array containing the data from Best Track. Returns: List of datetime.datetime elements. Return type: list
-
pyphoon.io.utils.
get_best_date
(best_data_sample)¶ Get the date of best data sample best_data_sample. To this end, it uses the date features of the sample, namely the year, month, day and hour features.
Parameters: best_data_sample – Returns:
-
pyphoon.io.utils.
id2date
(identifier)¶ Gets the date of a typhoon image frame with id given by identifier. A typical id is in the format <seq_no>_<YYYYMMDD>, where seq_no denotes the sequence number (e.g. 199801).
Parameters: identifier (str) – Identifier of an image or best track frame. Returns: Date of the frame Return type: datetime.datetime
-
pyphoon.io.utils.
id2seqno
(identifier)¶ Gets sequence number from a typhoon id. E.g, 199802_199980101 -> 199802
Parameters: identifier (str) – Typhoon unique identifier, e.g. 199802_199980101. Returns: Sequence number. Return type: str
-
pyphoon.io.utils.
date2id
(date, seq_no)¶ Generates the id of an image frame sample using its date and the id of the typhoon sequence it belongs to.
Parameters: - date (datetime.datetime) – Date of the sample
- seq_no (str) – Typhoon sequence number, e.g. “199607”.
Returns: Id of the sample corresponding to the given sequence name and date.
Return type: str
-
pyphoon.io.utils.
hdffile2name
(path_h5file)¶ Given a path to an HDF5 file it obtains the file name (without format extension). E.g, file.h5 -> file
Parameters: path_h5file (str) – Path to an HDF file. Returns: Name of file without format extension. Return type: str
-
pyphoon.io.utils.
folder2name
(path_folder)¶ Given a path to a folder it obtains the name of the folder alone. E.g. /path/to/some/folder -> folder
Parameters: path_folder (str) – Path to a folder. Returns: Name of folder. Return type: str
-
pyphoon.io.utils.
imagefilename2id
(filename)¶ Gets the id of an image sample. The id is generated using two main components. - The date of the typhoon sample. - Typhoon sequence number.
Note that typhoons from different sequences might have the same ID since they were recorded at the same time. Therefore, the final id is constructed using both the date and the typhoon sequence number together.
To build the image id, the name of the original HDF file is used, which have the following structure:
YYYYMMDDHH-<typhoon id>-<satellite model>.h5
We can then parse it to the id, namely:
<typhoon id>_YYYYMMDDHH.
Parameters: filename (str) – HDF image filename. Returns: Image frame identifier. Return type: int
-
pyphoon.io.utils.
imagefilename2date
(filename)¶ Extracts the date from a file with a specific filename. To obtain the image date from the filename, the filename must have the following structure:
YYYYMMDDHH-<typhoon id>-<satellitemodel>.h5.
Parameters: filename (str) – Name of the HDF image file. Returns: Date the image with a given filename was taken. Return type: datetime.datetime