pyphoon.db¶
This class provides various tools to build up and manage a database with Digital Typhoon data. To get familiar, you may want to have a look at the example here.
Content¶
module | Description |
---|---|
pyphoon.db.data_extractor |
Extracts data from a PDManager instance and generates a dataset ready to be used for training. |
pyphoon.db.pd_manager |
Encapsulates typhoon dataset information, including corrected and original image versions. |
pyphoon.db.data_extractor¶
-
class
pyphoon.db.data_extractor.
DataExtractor
(original_images_dir, corrected_images_dir, pd_manager)¶ Bases:
object
Data extractor. Operates with DataFrames, created by PDManager.
-
get_good_triplets
(seq_no, allow_corrected=True)¶ Gets triplets of frames (3 subsequent frames) from a given sequence, where non of frames is missing.
Parameters: - seq_no (int) – Number of sequence (ID)
- allow_corrected (bool) – Allow including corrected images into triplets
-
generate_images_shuffled_chunks
(images_per_chunk, output_dir, seed=0, use_corrected=True, preprocess_algorithm=None, display=False)¶ Generates chunks of hdf5 files, containing shuffled images from different sequences and Best Track data. It does not care about the typhoon IDs, hence images belonging to the same typhoon sequence might be found in training, validation and/or test sets.
Parameters: - images_per_chunk (int) – Number of images to be stored per chunk.
- output_dir (str) – Directory to store chunk files.
- seed (int, default 0) – Seed for random shuffle.
- use_corrected (bool) – Set to True to include corrected images into training dataset.
- preprocess_algorithm (callable) – Algorithm for data pre-processing, which returns data of the same shape as an input.
- display (bool) – flag for displaying execution information.
-
get_full_filenames
(use_corrected=True)¶ Get full_path series, preferring entries from the corrected dataframes.
Parameters: use_corrected (bool) – Flag for including corrected images or not. Rtype full_paths: List with paths of images. Return full_paths: list
-
generate_images_besttrack_chunks
(sequence_list, chunk_size, output_dir, preprocess_algorithm=None, display=False)¶ Generates chunks of hdf5 files, containing images and Best Track data.
Parameters: - sequence_list (list) – List of tuples [(seq_no, prefix), …]
- chunk_size (int) – Size of chunks in bytes
- output_dir (str) – Output dir
- preprocess_algorithm – Algorithm for data preprocessing, which returns data of the same shape as an input.
-
read_seq
(seq_no, features, preprocess_algorithm=None)¶ Reads the features of a given typhoon sequence
Parameters: - seq_no (str) – Number of typhoon sequence to be retrieved.
- features (list) – Features to be obtained.
- preprocess_algorithm (callable) – Algorithm for data preprocessing, which returns data of the same shape as an input.
Returns: tuple with three elements:
- images:
- images_ids:
- feature_data:
:rtype tuple
-
pyphoon.db.pd_manager¶
-
class
pyphoon.db.pd_manager.
PDManager
(compression='gzip')¶ Bases:
object
Class to manage and help in the analysis of the dataset. It stores references to the image files, dates of the data, corrected images etc. in pandas.DataFrame objects.
-
besttrack
= None¶ DataFrame for best track data.
-
images
= None¶ DataFrame for original image data.
-
missing
= None¶ DataFrame for information about
-
corrected
= None¶ DataFrame for corrected image data.
-
add_original_images
(directory)¶ Adds information about original images to the class attribute images.
Parameters: directory (str) – Path to image dataset.
-
save_original_images
(filename)¶ Saves the class attribute images as a pickle file.
Parameters: filename (str) – Path to the pickle file.
-
load_original_images
(filename)¶ Loads the image data from a pickle file as DataFrame storing it as the class attribute images.
Parameters: filename (str) – Path to the pickle file.
-
add_besttrack
(directory)¶ Adds information from the best data to the class attribute besttrack.
Parameters: directory (str) – Path where source files are stored
-
save_besttrack
(filename)¶ Saves the class attribute besttrack as a pickle file.
Parameters: filename (str) – Path to the pickle file.
-
load_besttrack
(filename)¶ Loads the best data from a pickle file as DataFrame storing it as the class attribute besttrack.
Parameters: filename (str) – Path to the pickle file.
-
add_corrected_images
(directory)¶ Adds information about the corrected images to the class attribute corrected.
Parameters: directory (str) – Path to image dataset.
-
save_corrected_images
(filename)¶ Saves the class attribute corrected as a pickle file.
Parameters: filename (str) – Path to the pickle file.
-
load_corrected_images
(filename)¶ Loads the corrupted data from a pickle file as DataFrame storing it as the class attribute corrected.
Parameters: filename (str) – Path to the pickle file.
-
add_corrected_info
(orig_images_dir, corrected_dir)¶ Adds information about corrected images to the corrected dataset.
Parameters: - orig_images_dir (str) – original images folder.
- corrected_dir (str) – corrected images folder.
Raises: Exception
-
add_missing_images_info
()¶ Creates a dataset with information about missing images.
Raises: Exception
-
save_missing_images
(filename)¶ Saves Missing DataFrame to a file.
Parameters: filename (str) – Path to the pickle file.
-
load_missing_images_info
(filename)¶ Loads Missing DataFrame from a file.
Parameters: filename (str) – Path to the pickle file.
-
add_frames
()¶ Adds frames numbers to the original images DataFrame. Both original images and missing DataFrames should be loaded.
Raises: Exception
-
get_obs_time_from_frame_num
(seq_no, frame_num)¶ Returns Timestamp object related to a missing frame (numeration starts from 0).
Parameters: - seq_no (str) – number of sequence.
- frame_num (int) – number of image in sequence
Returns:
-
get_image_from_seq_no_and_frame_num
(seq_no, frame_num)¶ Parameters: - seq_no (str) – number of sequence.
- frame_num (int) – number of image in sequence
Returns:
-