cvjson package¶
cvjson.cvj module¶
Author: Benjamin Anderson Garrard
Official documentation https://bengarrard.bitbucket.io/
This script creates a handle object for json in the COCO format. The aim for this handle is to make redundant code less redundant, safer, easy to use, and data extraction very simple.
This api uses CVJ as the super class and all other subclasses are meant to extend the functionality.
Structure this library uses is as follows.
annotations :
[{
"id": int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}]
categories : [{
"id": int,
"name": str,
"supercategory": str,
}],
images : [{
"id": int,
"file_name": str,
"width": int,
"height": int
}],
More information on how this structure is chosen read “Introduction to the CVJ”
-
class
cvjson.cvj.
CVJ
(json_path=None, image_folder_path=None)¶ Bases:
object
The CVJ class is the most basic class and will only give information based on the current json file supplied. This means that regarding purely the json file and accompanying files, images, etc. This will describe that data or help generate the information in to usable data. Anything else that is outside gaining insight or gathering data from the json will be in the form of an extension.
- Dictionary enums:
- IMID_2_ANNS , Image ID to Annoations
- CLID_2_NAME, Class ID to Class Name
- CLNAME_2_CLID, Class name to Class ID
- IMID_2_FNAME, Image ID to File Name
- FNAME_2_IMID, File Name to Image ID
- IMID_2_FPATH, Image ID to File Path
- IMID_2_IMATTR, Image Id to Image Attributes
- CLID_2_ANNS, Class ID to Annotations
-
CLID_2_ANNS
= 7¶
-
CLID_2_NAME
= 1¶
-
CLNAME_2_CLID
= 2¶
-
FNAME_2_IMID
= 4¶
-
IMID_2_ANNS
= 0¶
-
IMID_2_FNAME
= 3¶
-
IMID_2_FPATH
= 5¶
-
IMID_2_IMATTR
= 6¶
-
NEGATIVE_CLASS
= 63428483¶
-
categ_idx_to_coco_categ
(id)¶ This method creates a pseudo COCO category annotation. This type of annotation is mostly useless besides the actual class id. AKA category id.
Parameters: id (int) – This parameter is the class id, AKA the category id Returns: dict – - keys = “id”, “name”, “supercategory”. For further explanation see “Introduction to the CVJ”
Return type: dict
-
clean_categories
(save=False)¶ This method cleans the internal json data’s categories. It decides that if there is no annotations for a category it removes the category found from the internal json data.
Parameters: save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data. Returns: list – The return value is named “remove_list” and it is returning a list of categories that have been removed from the internal json data. Return type: list
-
clean_images
(save=False)¶ This method cleans the internal json data’s images. It decides that if there is no annotations for an image then it removes the image found from the internal json data.
Parameters: save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data. Returns: list – The return value is named “remove_list” and it is returning a list of image attributes that have been removed from the internal json data. Return type: list
-
create_empty_json
()¶ This method assists in making the json for a COCO format.
Returns: dict – - keys = “images”, “categories”, “annotations”. Example found in “Introduction to the CVJ”
Return type: dict
-
create_json
(new_json, save_path=None)¶ This method creates a json file from a dictionary that is supplied. If no save path is supplied then it creates a file in the folder containing the json path supplied to the object and the file name will be “_new_json_DEFAULT.json”
Parameters: - new_json (dict) –
- save_path (string) – (Default value = None) Needs to be a path with a file name
Returns: dict –
- The same dict that was supplied
Return type: dict
-
create_json_by_class
(list_of_ids, verbose=True)¶ This method creates a json based on the class id’s supplied. The json will only have annotations for those classes.
The json is not saved in this method
Parameters: - list_of_ids (list) – A list of class ids to be included in to the json dictionary
- verbose (bool) – (Default value = False) This prints out a verbose message of what iteration count it is at when it is searching through the json through the annotations.
Returns: dict –
- keys = class ids
- values = annotations associated with each class
Return type: dict
-
create_json_of_class_focused_images
(list_of_class_ids)¶ This method generates a dictionary in the COCO format json that this library uses from a file generated by the cropper known as “{TIMESTAMP}_image_class_counts.json”.
That file has the filepaths to each image in the “{TIMESTAMP}_coco_train.json” and the class the images that were created were based off of.
Using that file, this method, given a list of ids, turns the selected class id’s in to a new json that has the images most strongly associated with those IDs. By strong I mean that class will be in the center of these images if the crop_images_bbox_centered() was used. There could be more annotations for another class on the image.
NOTE: This method does not save the dictionary. The user must save it. create_json() can do the trick.
Parameters: list_of_class_ids (list) – A list of class ids to single out for the new json Returns: dict – - keys = “images”, “annotations”, “categories”
- values = [image], [annotation], [category]
Return type: dict
-
entry_bbox
(bbox, class_id, image_id, id)¶ This method assists with entering valid annotations.
Parameters: - bbox (list) – This parameter is bounding box coordinates in the format of [x, y, width, height]
- class_id (int) – This parameter is the class id also known as the category id
- image_id (int) – This parameter is the image id that this annotation belongs to
- id (int) – This parameter is the id of the annotation
Returns: dict –
- keys = annotation format found in “Introduction to the CVJ”
- values = the values supplied.
Return type: dict
-
entry_img
(file_name, height, width, id)¶ This method assists with entering an image and the attributes of that image.
Parameters: - file_name (string) – This parameter is the filename of the image that is being inserted in to the json file
- height (int) – This parameter is the height of the image. Normally this is equivalent to img.shape[0] when using numpy
- width (int) – This parameter is the width of the image. Normally this is equivalent to img.shape[1] when using numpy
- id (int) – This parameter is the image id and file name are the most crucial components to this entry. Without the image_id most of the functions in this object will not help the user)
Returns: dict –
- keys = “file_name”, “height”, “width”, “id”. For more explanation see “Introduction to the CVJ”
Return type: dict
-
get_annotations
()¶ This method returns all of the annotations from the internal json data of the CVJ object.
Returns: list Return type: list
-
get_average_area_by_class
(show_plot=True)¶ This method will get the average area for each class and returns the values in a dictionary. If the show_plot param is true then it plots the dictionary.
Parameters: show_plot (bool) – (Default value = True) Returns: dict – - Keys = class ids
- values = average areas associated with each class id
Return type: dict
-
get_average_side_lengths
(show_plot=True)¶ This method makes a list of the areas from the annotations and just appends them.
Parameters: show_plot (bool) – (Default value = True) Returns: - list (list) – A list of the square root of the areas.
- Now if the show_plot parameter is equal to True.
- Then seaborn will plot it and wait for input.
-
get_category_ids
()¶ This method returns all of the category ids from the internal json data of the CVJ object.
Returns: list Return type: list
-
get_category_names
()¶ This method returns all of the category names from the internal json data of the CVJ object.
Returns: list Return type: list
-
get_class_count_by_filename
(filename, show_plot=True)¶ This method will count how many examples of bounding boxes exist for each class is on the supplied filename.
NOTE: The filename supplied must be a part of the json file that is stored within the object.
Parameters: - filename (string) – This parameter is the filename of the image the user wants to find out how many bounding boxes are on the image and what there classes are
- show_plot (bool) – (Default value = True) This parameter when set to true generates a bar plot showing the class id on the y axis
Returns: dict –
- keys = class ids
- values = count of annotations for each of the class ids
Return type: dict
-
get_class_count_by_img_id
(img_id, show_plot=True)¶ This method will count how many examples of bounding boxes exist for each class on the supplied image id. If the show_plot variable is true then it generates a bar plot for the image id and the puts the classes on the x axis and the counts on the y axis
NOTE: The image id supplied must be a part of the json file that is stored within the object.
Parameters: - img_id (int) – This parameter is the image id of the img the user wants to find out how many bounding boxes are on the image and what there classes are
- show_plot (bool) – (Default value = True) This parameter when set to true generates a bar plot showing the class ids on the x axis
Returns: dict –
- keys = class ids
- values = count of annotations for each of the class ids
Return type: dict
-
get_class_id_2_anns
(class_id=None, json_data=None, verbose=False)¶ This method returns a dictionary that has the class id as the key and the annotations to that class as the values.
If there is already one created it just returns the previously made one to improve performance.
Parameters: - class_id (int) – (Default = None) This is the class ID for the annotations associated with that class.
- json_data (dict) –
(Default value = None) This is the loaded data from a COCO formatted JSON file.
- If this is supplied all data returned will be from this variable.
- verbose (bool) – (Default value = False)
Returns: - dict (dict)
- This returns only if the img_id is not supplied to this method –
- keys = class ids
- values = annotations associated with each class id
-
get_class_id_2_anns_count
(show_plot=True)¶ This function gets the count of bboxes by class ID. If show_plot is True (Default) then this will have a seaborn barchart pop up.
Parameters: show_plot (bool) – (Default value = True) Returns: dict – - keys = class ids
- values = count of annotations for each of the class ids
Return type: dict
-
get_class_id_2_name
(class_id=None, json_data=None)¶ This method creates a dictionary using the class id, AKA “category id”, as the keys and the category names, AKA class names, as the values.
If there is already one created it just returns the previously made one to improve performance.
Parameters: - class_id (ing) – (Default = None) This is the class ID for a class that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns: dict –
- This is only returned if there is no class_id supplied to the method.
- keys = category ids
- values = category names associated with each category id
Return type: dict
-
get_class_name_2_id
(class_name=None, json_data=None)¶ This method creates a dictionary using the category name as the key and the category id’s are the values.
If there is already one created it just returns the previously made one to improve performance.
Parameters: - class_name (string) – (Default = None) This is the class name for a class that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) –
(Default value = None) This is the loaded data from a COCO formatted JSON file.
- If this is supplied all data returned will be from this variable.
Returns: int (int) – If class_name is supplied then this method returns the class ID.
dict (dict) –
- This is only returned if there is no class_name supplied to the method.
- keys = class names like “bear”, “car”, “alien”, “person”, etc
- values = the category ids or also known as the class ids. The number that represents the class.
-
get_count_files_by_class
(verbose=False, show_plot=False)¶ This method is only used with the image_class_count.json file generated by the Cropper class. This just shows how many files were made for each class. If cropping to bounding box center was used then it will either have the same amount of images for each class as there is bounding boxes for each class or more through augments.
Parameters: - verbose (bool) – (Default value = False) This parameter has the console output information during it gathering the data. The verbose will look similar to “Class ID 5 has 280 images”
- show_plot (bool) – (Default value = False) This parameter when set to true generates a bar plot showing the class id on the x axis
Returns: dict –
- keys = class ids
- values = image counts. (How many images are associated with the class id)
Return type: dict
-
get_dictionary
(cvj_enum)¶ CVJ.IMID_2_ANNS = Image ID to Annoations CVJ.CLID_2_NAME = Class ID to Class Name CVJ.CLNAME_2_CLID = Class name to Class ID CVJ.IMID_2_FNAME = Image ID to File Name CVJ.FNAME_2_IMID = File Name to Image ID CVJ.IMID_2_FPATH = Image ID to File Path CVJ.IMID_2_IMATTR = Image Id to Image Attributes CVJ.CLID_2_ANNS = Class ID to Annotations
-
get_distribution_of_area
(show_plot=True)¶ This method makes a list of the areas from the annotations and just appends them.
Parameters: show_plot (bool) – (Default value = True) Returns: list – A list of appended areas of the bounding boxes. Return type: list Example
So an example of what the returned list could look like is:
[100,100,2000,3000,4000,2405,500,50,500]
Now if the show_plot parameter is equal to True. Then seaborn will plot it and wait for input.
-
get_distribution_of_class_id
(show_plot=True)¶ This method makes a list of the category id’s from the annotations and just appends them.
Parameters: show_plot (bool) – (Default value = True) Example
So an example of what the list could look like is:
[1,1,2,3,4,5,5,5,5]
Now if the show_plot parameter is equal to True. Then seaborn will plot it and wait for input.
-
get_filename_2_image_id
(filename=None, json_data=None)¶ This method creates a dictionary using the filename as the key and the image id as the value.
If there is already one created it just returns the previously made one to improve performance.
Parameters: - img_id (int) – (Default = None) This is the file name for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) –
(Default value = None) This is the loaded data from a COCO formatted JSON file.
- If this is supplied all data returned will be from this variable.
Returns: int (int) – This is the image ID of the file name that was supplied to this method.
dict (dict) –
- This is only returned if there is no file name supplied to the method.
- keys = filenames with the extension so the will have “.png”, “.tif”, or something similar
- values = image ids
-
get_filenames
()¶ This method returns the filnames of images from the internal json data of the CVJ object.
Returns: list Return type: list
-
get_image_id_2_anns
(img_id=None, json_data=None)¶ This method creates a dictionary using the “image_id” as the key and the value is the annotations list that is described at the beginning of this script. If there is already one created it just returns the previously made one to improve performance.
Parameters: - img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns: dict –
- This is only returned if img_id is not supplied to the method.
- keys = image ids
- values = annotations associated with the image id
Return type: dict
-
get_image_id_2_filename
(img_id=None, json_data=None)¶ This method creates a dictionary using the image id as the key and the values are the filenames associated with the image id.
If there is already one created it just returns the previously made one to improve performance.
Parameters: - img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns: string (string) – If the img_id is supplied then this method will return the file name associated with that image id.
dict (dict) –
- This is only returned if there is no img_id supplied to the method.
- keys = image ids
- values = filenames with the extension so the will have “.png”, “.tif”, or something similar
-
get_image_id_2_filepath
(img_id=None)¶ This method will not work unless an image filepath has been supplied. So first set the filepath like so:
cvj_object.image_folder_path = /your/path/to/images
This method creates a dictionary using the image id as the key and the filepaths associated with the image id as the value.
If there is already one created it just returns the previously made one to improve performance.
Parameters: img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable. Returns: - string (string) – This is the filepath of the supplied Image ID
- dict (dict) –
- This returns only if the img_id is not supplied to this method
- keys = image ids
- values = the filepaths associated with each image id
-
get_image_id_2_image_attribs
(img_id=None, json_data=None)¶ This method creates a dictionary using the image id as the key and the attributes of that image as the value
If there is already one created it just returns the previously made one to improve performance.
Parameters: - img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
- json_data (dict) –
(Default value = None) This is the loaded data from a COCO formatted JSON file.
- If this is supplied all data returned will be from this variable.
Returns: dict (dict) – If the image id is supplied to the img_id variable then this method returns a dict with the attributes of the image. For more information on the format of the dictionary returned look at the top of this script or refer to the official documentation page here -> https://bengarrard.bitbucket.io/ and look for “Introduction to the CVJ”.
dict (dict) –
- This returns only if the img_id is not supplied to this method
- keys = image ids
- values = image attributes associated with each image id
-
get_image_ids
()¶ This method returns the image id’s from the internal json data of the CVJ object
Returns: list Return type: list
-
get_max_counts_per_img
(show_plot=True)¶ This method will plot the most demanding image for cropping each bounding box. This method goes through each image and counts the bounding boxes corresponding to the image. It then stores the maximum count of annotations for a class for that image in a dictionary with the key as the img id.
This ends up being that each img_id will show the maximum count of a class out of all classes within each image. This will be plotted using seaborn.
To be quite honest I don’t think that the chart is very useful, however the returned data can be.
Parameters: show_plot (bool) – (Default value = True) Returns: Example
If I have img_id 1 and I want to know which class is the most dominant in this image then I just simply call this method like below
from cvj import CVJ cvj_object = CVJ(json_path) image_id_2_class_counts, classes = cvj_object.get_max_counts_per_img(show_plot=False) i = 0 for image_id, class_count in image_id_2_class_counts.items(): print("The img_id {} has class {} as the most dominant class with {} annotations".format(class_count, classes[i], class_count)) i += 1
Then in the plot I just look at the x axis and find the number 1 and then see what class is
-
load_json
(path)¶ This is just a helper method that loads external files into data and is returned to the user. It does not get stored in to the object. The path to the json must be set to load new json data in the object.
Parameters: path (string) – This path must be to any valid json file. Returns: dict – - keys = User defined
- values = User defined
Return type: dict
-
remove_by_name
(list_of_image_names, save=False)¶ This method removes all of the annotations and images associated with the list of image names supplied. The image names must be the basenames of any file. This method will clean the internal json data categories after completeing the removal of images and the annotations associated with them.
Parameters: - list_of_image_names (list) – This argument is the list of basenames for the images to be removed. So they must be names like “8.png, 8.tif, 4.jpeg” and not like “home/User/8.png, server/Desktop/5.tif”.
- save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data.
Returns: - list (list) – The first return value is named “list_of_image_names” which is just the list of names that was supplied.
- list (list) – The second return value is named “imgs” and it is returning a list of image attributes that have been removed from the internal json data.
- list (list) – The third return value is named “anns” and it is returning a list of annotations that have been removed from the internal json data.
- list (list) – The fourth return value is named “cats” and it is returning a list of categories that have been removed from the internal json data. If those cateogories no longer have annotations associated with them.
-
replace_extensions_of_json_images
(replacement='.png', save=False)¶ This method replaces the file extension of the images to the replacement type given.
Parameters: - replacement (string) –
- (Default = .png)
- This is the variable to replace the extensions with.
- save (bool) – (Default = False) This is used to save the internal json data to the json file found at the path given using cvj_obj.json_path = “path/to/your/json”
Returns: dict – This returns the internal json data of the CVJ object.
Return type: dict
- replacement (string) –
-
save_internal_json
(save_name=None)¶ This method saves the internal json data dictionary. This method is available when updates to the internal are done. If the save_name variable is supplied it will be saved at that location with that name. Else it will overwrite the json that was given to the object.
Parameters: save_name (string) – (Default value = None) This parameter is the name of the file. While I am saying name, I mean it could be a file path plus the actual file name. Returns: dict – - The internal json dictionary.
Return type: dict
-
transfer_negatives_to_other_json
(path_to_images=None, cvj_obj=None, json_data=None, json_path=None, save=False)¶ This method looks for negative sample type images in the internal json data that was created by the Painter class and then transfers those images to another json that is supplied via a path to a json file or the actual data from the json to be transferred to.
Parameters: - path_to_images (string) – (Default = None) This is the path that needs to be pointing to the negative images. This is used to get the path names and check the images for height and width. If an error occurs during the check it means that the file wasn’t read correctly be opencv and your file may be corrupt.
- cvj_obj (CVJ) – (Default value = None) This argument is for a CVJ object that has already been loaded with a json path or json data. If you need to know what the CVJ object is read “Introduction to the CVJ”
- json_data (dictionary) – (Default value = None) This value is used to transfer the negative images over to a dictionary that is already COCO formatted. So if the user calling this method has loaded a json file already
- json_path (string) – (Default value = None) This is a path to a COCO formatted json file. In this method it is used to create a CVJ object
- save (bool) – (Default value = None) This argument is used to save the internal json at the json path supplied to the object that has called this method. It is defaulted to False becuase it could take a while to save. This is up for the user to decide.
Returns: - CVJ (CVJ) – The first return value is “cvj_obj” which is a CVJ object. This holds the transferred images now and will need to be saved by the user. If a json path was supplied here, upon returning the user can call teh “save_internal_json()” and it will save it where the json file is. If needing to understand what the CVJ object is refer to “Introduction to the CVJ”.
- list (list) – The second return value is “imgs” which is a list of the images that have been transferred to the supplied json data.
-
update_images
(list_of_paths, remove=False)¶ This method updates the image annotations within the json_data that is stored within this CVJ object
Parameters: list_of_paths (list) – This parameter is a list of paths to the images that the user is wanting to input in to the internal json data.
-
xywh_to_xyxy
(bboxes)¶ This method converts the bounding boxes of a numpy array in the format [[x, y, width, height]] to the format [[x1, y1, x2, y2]]
Parameters: bboxes (numpy array) – This is the numpy array for bounding boxes in the format of [[x, y, width, height]] Returns: numpy array – - This is the numpy array for bounding boxes in the format of [[x1, y1, x2, y2]]
Return type: numpy array Example
This code takes the bounding boxes in the form of a numpy array with the format x, y, width, and height. For example
x = bboxes[:,0] #is all of the x's in the array y = bboxes[:,1] #is all of the y's in the array widths = bboxes[:,2] #is all of the w's in the array heights = bboxes[:,3] #is all of the h's in the array
Directly below is somewhat how your array will have to look like
[[ 24, 25, 4, 5], [50, 50, 7, 6] [....], [....]]
-
xyxy_to_xywh
(xyxy)¶ This method converts the bounding boxes of a numpy array in the format [[x1, y1, x2, y2]] to the format [[x, y, width, height]]
Parameters: xyxy (numpy array) – This is the numpy array for bounding boxes in the format of [[x1, y1, x2, y2]] Returns: numpy array – - This is the numpy array for bounding boxes in the format of [[x, y, width, height]]
Return type: numpy array