pyphoon.db

This class provides various tools to build up and manage a database with Digital Typhoon data. To get familiar, you may want to have a look at the example here.

Content

module Description
pyphoon.db.data_extractor Extracts data from a PDManager instance and generates a dataset ready to be used for training.
pyphoon.db.pd_manager Encapsulates typhoon dataset information, including corrected and original image versions.

pyphoon.db.data_extractor

class pyphoon.db.data_extractor.DataExtractor(original_images_dir, corrected_images_dir, pd_manager)

Bases: object

Data extractor. Operates with DataFrames, created by PDManager.

get_good_triplets(seq_no, allow_corrected=True)

Gets triplets of frames (3 subsequent frames) from a given sequence, where non of frames is missing.

Parameters:
  • seq_no (int) – Number of sequence (ID)
  • allow_corrected (bool) – Allow including corrected images into triplets
generate_images_shuffled_chunks(images_per_chunk, output_dir, seed=0, use_corrected=True, preprocess_algorithm=None, display=False)

Generates chunks of hdf5 files, containing shuffled images from different sequences and Best Track data. It does not care about the typhoon IDs, hence images belonging to the same typhoon sequence might be found in training, validation and/or test sets.

Parameters:
  • images_per_chunk (int) – Number of images to be stored per chunk.
  • output_dir (str) – Directory to store chunk files.
  • seed (int, default 0) – Seed for random shuffle.
  • use_corrected (bool) – Set to True to include corrected images into training dataset.
  • preprocess_algorithm (callable) – Algorithm for data pre-processing, which returns data of the same shape as an input.
  • display (bool) – flag for displaying execution information.
get_full_filenames(use_corrected=True)

Get full_path series, preferring entries from the corrected dataframes.

Parameters:use_corrected (bool) – Flag for including corrected images or not.
Rtype full_paths:
 List with paths of images.
Return full_paths:
 list
generate_images_besttrack_chunks(sequence_list, chunk_size, output_dir, preprocess_algorithm=None, display=False)

Generates chunks of hdf5 files, containing images and Best Track data.

Parameters:
  • sequence_list (list) – List of tuples [(seq_no, prefix), …]
  • chunk_size (int) – Size of chunks in bytes
  • output_dir (str) – Output dir
  • preprocess_algorithm – Algorithm for data preprocessing, which returns data of the same shape as an input.
read_seq(seq_no, features, preprocess_algorithm=None)

Reads the features of a given typhoon sequence

Parameters:
  • seq_no (str) – Number of typhoon sequence to be retrieved.
  • features (list) – Features to be obtained.
  • preprocess_algorithm (callable) – Algorithm for data preprocessing, which returns data of the same shape as an input.
Returns:

tuple with three elements:

  • images:
  • images_ids:
  • feature_data:

:rtype tuple


pyphoon.db.pd_manager

class pyphoon.db.pd_manager.PDManager(compression='gzip')

Bases: object

Class to manage and help in the analysis of the dataset. It stores references to the image files, dates of the data, corrected images etc. in pandas.DataFrame objects.

besttrack = None

DataFrame for best track data.

images = None

DataFrame for original image data.

missing = None

DataFrame for information about

corrected = None

DataFrame for corrected image data.

add_original_images(directory)

Adds information about original images to the class attribute images.

Parameters:directory (str) – Path to image dataset.
save_original_images(filename)

Saves the class attribute images as a pickle file.

Parameters:filename (str) – Path to the pickle file.
load_original_images(filename)

Loads the image data from a pickle file as DataFrame storing it as the class attribute images.

Parameters:filename (str) – Path to the pickle file.
add_besttrack(directory)

Adds information from the best data to the class attribute besttrack.

Parameters:directory (str) – Path where source files are stored
save_besttrack(filename)

Saves the class attribute besttrack as a pickle file.

Parameters:filename (str) – Path to the pickle file.
load_besttrack(filename)

Loads the best data from a pickle file as DataFrame storing it as the class attribute besttrack.

Parameters:filename (str) – Path to the pickle file.
add_corrected_images(directory)

Adds information about the corrected images to the class attribute corrected.

Parameters:directory (str) – Path to image dataset.
save_corrected_images(filename)

Saves the class attribute corrected as a pickle file.

Parameters:filename (str) – Path to the pickle file.
load_corrected_images(filename)

Loads the corrupted data from a pickle file as DataFrame storing it as the class attribute corrected.

Parameters:filename (str) – Path to the pickle file.
add_corrected_info(orig_images_dir, corrected_dir)

Adds information about corrected images to the corrected dataset.

Parameters:
  • orig_images_dir (str) – original images folder.
  • corrected_dir (str) – corrected images folder.
Raises:

Exception

add_missing_images_info()

Creates a dataset with information about missing images.

Raises:Exception
save_missing_images(filename)

Saves Missing DataFrame to a file.

Parameters:filename (str) – Path to the pickle file.
load_missing_images_info(filename)

Loads Missing DataFrame from a file.

Parameters:filename (str) – Path to the pickle file.
add_frames()

Adds frames numbers to the original images DataFrame. Both original images and missing DataFrames should be loaded.

Raises:Exception
get_obs_time_from_frame_num(seq_no, frame_num)

Returns Timestamp object related to a missing frame (numeration starts from 0).

Parameters:
  • seq_no (str) – number of sequence.
  • frame_num (int) – number of image in sequence
Returns:

get_image_from_seq_no_and_frame_num(seq_no, frame_num)
Parameters:
  • seq_no (str) – number of sequence.
  • frame_num (int) – number of image in sequence
Returns: