cvjson package

cvjson.cvj module

Author: Benjamin Anderson Garrard

Official documentation https://bengarrard.bitbucket.io/

This script creates a handle object for json in the COCO format. The aim for this handle is to make redundant code less redundant, safer, easy to use, and data extraction very simple.

This api uses CVJ as the super class and all other subclasses are meant to extend the functionality.

Structure this library uses is as follows.

annotations  :
            [{
                "id": int,
                "image_id": int,
                "category_id": int,
                "segmentation": RLE or [polygon],
                "area": float,
                "bbox": [x,y,width,height],
                "iscrowd": 0 or 1,
                }]

categories  : [{
                "id": int,
                "name": str,
                "supercategory": str,
                 }],

images      :  [{
                "id": int,
                "file_name": str,
                "width": int,
                "height": int
                }],

More information on how this structure is chosen read “Introduction to the CVJ”

class cvjson.cvj.CVJ(json_path=None, image_folder_path=None)

Bases: object

The CVJ class is the most basic class and will only give information based on the current json file supplied. This means that regarding purely the json file and accompanying files, images, etc. This will describe that data or help generate the information in to usable data. Anything else that is outside gaining insight or gathering data from the json will be in the form of an extension.

Dictionary enums:
  • IMID_2_ANNS , Image ID to Annoations
  • CLID_2_NAME, Class ID to Class Name
  • CLNAME_2_CLID, Class name to Class ID
  • IMID_2_FNAME, Image ID to File Name
  • FNAME_2_IMID, File Name to Image ID
  • IMID_2_FPATH, Image ID to File Path
  • IMID_2_IMATTR, Image Id to Image Attributes
  • CLID_2_ANNS, Class ID to Annotations
CLID_2_ANNS = 7
CLID_2_NAME = 1
CLNAME_2_CLID = 2
FNAME_2_IMID = 4
IMID_2_ANNS = 0
IMID_2_FNAME = 3
IMID_2_FPATH = 5
IMID_2_IMATTR = 6
NEGATIVE_CLASS = 63428483
categ_idx_to_coco_categ(id)

This method creates a pseudo COCO category annotation. This type of annotation is mostly useless besides the actual class id. AKA category id.

Parameters:id (int) – This parameter is the class id, AKA the category id
Returns:dict
  • keys = “id”, “name”, “supercategory”. For further explanation see “Introduction to the CVJ”
Return type:dict
clean_categories(save=False)

This method cleans the internal json data’s categories. It decides that if there is no annotations for a category it removes the category found from the internal json data.

Parameters:save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data.
Returns:list – The return value is named “remove_list” and it is returning a list of categories that have been removed from the internal json data.
Return type:list
clean_images(save=False)

This method cleans the internal json data’s images. It decides that if there is no annotations for an image then it removes the image found from the internal json data.

Parameters:save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data.
Returns:list – The return value is named “remove_list” and it is returning a list of image attributes that have been removed from the internal json data.
Return type:list
create_empty_json()

This method assists in making the json for a COCO format.

Returns:dict
  • keys = “images”, “categories”, “annotations”. Example found in “Introduction to the CVJ”
Return type:dict
create_json(new_json, save_path=None)

This method creates a json file from a dictionary that is supplied. If no save path is supplied then it creates a file in the folder containing the json path supplied to the object and the file name will be “_new_json_DEFAULT.json”

Parameters:
  • new_json (dict) –
  • save_path (string) – (Default value = None) Needs to be a path with a file name
Returns:

dict

  • The same dict that was supplied

Return type:

dict

create_json_by_class(list_of_ids, verbose=True)

This method creates a json based on the class id’s supplied. The json will only have annotations for those classes.

The json is not saved in this method

Parameters:
  • list_of_ids (list) – A list of class ids to be included in to the json dictionary
  • verbose (bool) – (Default value = False) This prints out a verbose message of what iteration count it is at when it is searching through the json through the annotations.
Returns:

dict

  • keys = class ids
  • values = annotations associated with each class

Return type:

dict

create_json_of_class_focused_images(list_of_class_ids)

This method generates a dictionary in the COCO format json that this library uses from a file generated by the cropper known as “{TIMESTAMP}_image_class_counts.json”.

That file has the filepaths to each image in the “{TIMESTAMP}_coco_train.json” and the class the images that were created were based off of.

Using that file, this method, given a list of ids, turns the selected class id’s in to a new json that has the images most strongly associated with those IDs. By strong I mean that class will be in the center of these images if the crop_images_bbox_centered() was used. There could be more annotations for another class on the image.

NOTE: This method does not save the dictionary. The user must save it. create_json() can do the trick.

Parameters:list_of_class_ids (list) – A list of class ids to single out for the new json
Returns:dict
  • keys = “images”, “annotations”, “categories”
  • values = [image], [annotation], [category]
Return type:dict
entry_bbox(bbox, class_id, image_id, id)

This method assists with entering valid annotations.

Parameters:
  • bbox (list) – This parameter is bounding box coordinates in the format of [x, y, width, height]
  • class_id (int) – This parameter is the class id also known as the category id
  • image_id (int) – This parameter is the image id that this annotation belongs to
  • id (int) – This parameter is the id of the annotation
Returns:

dict

  • keys = annotation format found in “Introduction to the CVJ”
  • values = the values supplied.

Return type:

dict

entry_img(file_name, height, width, id)

This method assists with entering an image and the attributes of that image.

Parameters:
  • file_name (string) – This parameter is the filename of the image that is being inserted in to the json file
  • height (int) – This parameter is the height of the image. Normally this is equivalent to img.shape[0] when using numpy
  • width (int) – This parameter is the width of the image. Normally this is equivalent to img.shape[1] when using numpy
  • id (int) – This parameter is the image id and file name are the most crucial components to this entry. Without the image_id most of the functions in this object will not help the user)
Returns:

dict

  • keys = “file_name”, “height”, “width”, “id”. For more explanation see “Introduction to the CVJ”

Return type:

dict

get_annotations()

This method returns all of the annotations from the internal json data of the CVJ object.

Returns:list
Return type:list
get_average_area_by_class(show_plot=True)

This method will get the average area for each class and returns the values in a dictionary. If the show_plot param is true then it plots the dictionary.

Parameters:show_plot (bool) – (Default value = True)
Returns:dict
  • Keys = class ids
  • values = average areas associated with each class id
Return type:dict
get_average_side_lengths(show_plot=True)

This method makes a list of the areas from the annotations and just appends them.

Parameters:show_plot (bool) – (Default value = True)
Returns:
  • list (list) – A list of the square root of the areas.
  • Now if the show_plot parameter is equal to True.
  • Then seaborn will plot it and wait for input.
get_category_ids()

This method returns all of the category ids from the internal json data of the CVJ object.

Returns:list
Return type:list
get_category_names()

This method returns all of the category names from the internal json data of the CVJ object.

Returns:list
Return type:list
get_class_count_by_filename(filename, show_plot=True)

This method will count how many examples of bounding boxes exist for each class is on the supplied filename.

NOTE: The filename supplied must be a part of the json file that is stored within the object.

Parameters:
  • filename (string) – This parameter is the filename of the image the user wants to find out how many bounding boxes are on the image and what there classes are
  • show_plot (bool) – (Default value = True) This parameter when set to true generates a bar plot showing the class id on the y axis
Returns:

dict

  • keys = class ids
  • values = count of annotations for each of the class ids

Return type:

dict

get_class_count_by_img_id(img_id, show_plot=True)

This method will count how many examples of bounding boxes exist for each class on the supplied image id. If the show_plot variable is true then it generates a bar plot for the image id and the puts the classes on the x axis and the counts on the y axis

NOTE: The image id supplied must be a part of the json file that is stored within the object.

Parameters:
  • img_id (int) – This parameter is the image id of the img the user wants to find out how many bounding boxes are on the image and what there classes are
  • show_plot (bool) – (Default value = True) This parameter when set to true generates a bar plot showing the class ids on the x axis
Returns:

dict

  • keys = class ids
  • values = count of annotations for each of the class ids

Return type:

dict

get_class_id_2_anns(class_id=None, json_data=None, verbose=False)

This method returns a dictionary that has the class id as the key and the annotations to that class as the values.

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • class_id (int) – (Default = None) This is the class ID for the annotations associated with that class.
  • json_data (dict) –

    (Default value = None) This is the loaded data from a COCO formatted JSON file.

    • If this is supplied all data returned will be from this variable.
  • verbose (bool) – (Default value = False)
Returns:

  • dict (dict)
  • This returns only if the img_id is not supplied to this method
    • keys = class ids
    • values = annotations associated with each class id

get_class_id_2_anns_count(show_plot=True)

This function gets the count of bboxes by class ID. If show_plot is True (Default) then this will have a seaborn barchart pop up.

Parameters:show_plot (bool) – (Default value = True)
Returns:dict
  • keys = class ids
  • values = count of annotations for each of the class ids
Return type:dict
get_class_id_2_name(class_id=None, json_data=None)

This method creates a dictionary using the class id, AKA “category id”, as the keys and the category names, AKA class names, as the values.

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • class_id (ing) – (Default = None) This is the class ID for a class that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns:

dict

This is only returned if there is no class_id supplied to the method.
  • keys = category ids
  • values = category names associated with each category id

Return type:

dict

get_class_name_2_id(class_name=None, json_data=None)

This method creates a dictionary using the category name as the key and the category id’s are the values.

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • class_name (string) – (Default = None) This is the class name for a class that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) –

    (Default value = None) This is the loaded data from a COCO formatted JSON file.

    • If this is supplied all data returned will be from this variable.
Returns:

  • int (int) – If class_name is supplied then this method returns the class ID.

  • dict (dict) –

    This is only returned if there is no class_name supplied to the method.
    • keys = class names like “bear”, “car”, “alien”, “person”, etc
    • values = the category ids or also known as the class ids. The number that represents the class.

get_count_files_by_class(verbose=False, show_plot=False)

This method is only used with the image_class_count.json file generated by the Cropper class. This just shows how many files were made for each class. If cropping to bounding box center was used then it will either have the same amount of images for each class as there is bounding boxes for each class or more through augments.

Parameters:
  • verbose (bool) – (Default value = False) This parameter has the console output information during it gathering the data. The verbose will look similar to “Class ID 5 has 280 images”
  • show_plot (bool) – (Default value = False) This parameter when set to true generates a bar plot showing the class id on the x axis
Returns:

dict

  • keys = class ids
  • values = image counts. (How many images are associated with the class id)

Return type:

dict

get_dictionary(cvj_enum)

CVJ.IMID_2_ANNS = Image ID to Annoations CVJ.CLID_2_NAME = Class ID to Class Name CVJ.CLNAME_2_CLID = Class name to Class ID CVJ.IMID_2_FNAME = Image ID to File Name CVJ.FNAME_2_IMID = File Name to Image ID CVJ.IMID_2_FPATH = Image ID to File Path CVJ.IMID_2_IMATTR = Image Id to Image Attributes CVJ.CLID_2_ANNS = Class ID to Annotations

get_distribution_of_area(show_plot=True)

This method makes a list of the areas from the annotations and just appends them.

Parameters:show_plot (bool) – (Default value = True)
Returns:list – A list of appended areas of the bounding boxes.
Return type:list

Example

So an example of what the returned list could look like is:

[100,100,2000,3000,4000,2405,500,50,500]

Now if the show_plot parameter is equal to True. Then seaborn will plot it and wait for input.

get_distribution_of_class_id(show_plot=True)

This method makes a list of the category id’s from the annotations and just appends them.

Parameters:show_plot (bool) – (Default value = True)

Example

So an example of what the list could look like is:

[1,1,2,3,4,5,5,5,5]

Now if the show_plot parameter is equal to True. Then seaborn will plot it and wait for input.

get_filename_2_image_id(filename=None, json_data=None)

This method creates a dictionary using the filename as the key and the image id as the value.

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • img_id (int) – (Default = None) This is the file name for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) –

    (Default value = None) This is the loaded data from a COCO formatted JSON file.

    • If this is supplied all data returned will be from this variable.
Returns:

  • int (int) – This is the image ID of the file name that was supplied to this method.

  • dict (dict) –

    This is only returned if there is no file name supplied to the method.
    • keys = filenames with the extension so the will have “.png”, “.tif”, or something similar
    • values = image ids

get_filenames()

This method returns the filnames of images from the internal json data of the CVJ object.

Returns:list
Return type:list
get_image_id_2_anns(img_id=None, json_data=None)

This method creates a dictionary using the “image_id” as the key and the value is the annotations list that is described at the beginning of this script. If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns:

dict

This is only returned if img_id is not supplied to the method.
  • keys = image ids
  • values = annotations associated with the image id

Return type:

dict

get_image_id_2_filename(img_id=None, json_data=None)

This method creates a dictionary using the image id as the key and the values are the filenames associated with the image id.

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) – (Default value = None) This is the loaded data from a COCO formatted JSON file. * If this is supplied all data returned will be from this variable.
Returns:

  • string (string) – If the img_id is supplied then this method will return the file name associated with that image id.

  • dict (dict) –

    This is only returned if there is no img_id supplied to the method.
    • keys = image ids
    • values = filenames with the extension so the will have “.png”, “.tif”, or something similar

get_image_id_2_filepath(img_id=None)

This method will not work unless an image filepath has been supplied. So first set the filepath like so:

cvj_object.image_folder_path = /your/path/to/images

This method creates a dictionary using the image id as the key and the filepaths associated with the image id as the value.

If there is already one created it just returns the previously made one to improve performance.

Parameters:img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
Returns:
  • string (string) – This is the filepath of the supplied Image ID
  • dict (dict) –
    This returns only if the img_id is not supplied to this method
    • keys = image ids
    • values = the filepaths associated with each image id
get_image_id_2_image_attribs(img_id=None, json_data=None)

This method creates a dictionary using the image id as the key and the attributes of that image as the value

If there is already one created it just returns the previously made one to improve performance.

Parameters:
  • img_id (int) – (Default = None) This is the image ID for an image that is in the JSON data of the object or the supplied JSON data from the json_data variable.
  • json_data (dict) –

    (Default value = None) This is the loaded data from a COCO formatted JSON file.

    • If this is supplied all data returned will be from this variable.
Returns:

  • dict (dict) – If the image id is supplied to the img_id variable then this method returns a dict with the attributes of the image. For more information on the format of the dictionary returned look at the top of this script or refer to the official documentation page here -> https://bengarrard.bitbucket.io/ and look for “Introduction to the CVJ”.

  • dict (dict) –

    This returns only if the img_id is not supplied to this method
    • keys = image ids
    • values = image attributes associated with each image id

get_image_ids()

This method returns the image id’s from the internal json data of the CVJ object

Returns:list
Return type:list
get_max_counts_per_img(show_plot=True)

This method will plot the most demanding image for cropping each bounding box. This method goes through each image and counts the bounding boxes corresponding to the image. It then stores the maximum count of annotations for a class for that image in a dictionary with the key as the img id.

This ends up being that each img_id will show the maximum count of a class out of all classes within each image. This will be plotted using seaborn.

To be quite honest I don’t think that the chart is very useful, however the returned data can be.

Parameters:show_plot (bool) – (Default value = True)
Returns:
  • dict (dict) – *keys = image ids

    *values = the counts of the most prominent class on each image

  • list (list) – The list of the classes that are in the same order as the keys in the returned dictionary

Example

If I have img_id 1 and I want to know which class is the most dominant in this image then I just simply call this method like below

from cvj import CVJ

cvj_object = CVJ(json_path)
image_id_2_class_counts, classes = cvj_object.get_max_counts_per_img(show_plot=False)

i = 0
for image_id, class_count in image_id_2_class_counts.items():
    print("The img_id {} has class {} as the most dominant class with {} annotations".format(class_count, classes[i], class_count))
    i += 1

Then in the plot I just look at the x axis and find the number 1 and then see what class is

load_json(path)

This is just a helper method that loads external files into data and is returned to the user. It does not get stored in to the object. The path to the json must be set to load new json data in the object.

Parameters:path (string) – This path must be to any valid json file.
Returns:dict
  • keys = User defined
  • values = User defined
Return type:dict
remove_by_name(list_of_image_names, save=False)

This method removes all of the annotations and images associated with the list of image names supplied. The image names must be the basenames of any file. This method will clean the internal json data categories after completeing the removal of images and the annotations associated with them.

Parameters:
  • list_of_image_names (list) – This argument is the list of basenames for the images to be removed. So they must be names like “8.png, 8.tif, 4.jpeg” and not like “home/User/8.png, server/Desktop/5.tif”.
  • save (bool, optional) – (Default value = False) This option is used to save the internal json data to the json file that was used to give the CVJ object it’s data.
Returns:

  • list (list) – The first return value is named “list_of_image_names” which is just the list of names that was supplied.
  • list (list) – The second return value is named “imgs” and it is returning a list of image attributes that have been removed from the internal json data.
  • list (list) – The third return value is named “anns” and it is returning a list of annotations that have been removed from the internal json data.
  • list (list) – The fourth return value is named “cats” and it is returning a list of categories that have been removed from the internal json data. If those cateogories no longer have annotations associated with them.

replace_extensions_of_json_images(replacement='.png', save=False)

This method replaces the file extension of the images to the replacement type given.

Parameters:
  • replacement (string) –
    (Default = .png)
    This is the variable to replace the extensions with.
  • save (bool) – (Default = False) This is used to save the internal json data to the json file found at the path given using cvj_obj.json_path = “path/to/your/json”
Returns:

dict – This returns the internal json data of the CVJ object.

Return type:

dict

save_internal_json(save_name=None)

This method saves the internal json data dictionary. This method is available when updates to the internal are done. If the save_name variable is supplied it will be saved at that location with that name. Else it will overwrite the json that was given to the object.

Parameters:save_name (string) – (Default value = None) This parameter is the name of the file. While I am saying name, I mean it could be a file path plus the actual file name.
Returns:dict
  • The internal json dictionary.
Return type:dict
transfer_negatives_to_other_json(path_to_images=None, cvj_obj=None, json_data=None, json_path=None, save=False)

This method looks for negative sample type images in the internal json data that was created by the Painter class and then transfers those images to another json that is supplied via a path to a json file or the actual data from the json to be transferred to.

Parameters:
  • path_to_images (string) – (Default = None) This is the path that needs to be pointing to the negative images. This is used to get the path names and check the images for height and width. If an error occurs during the check it means that the file wasn’t read correctly be opencv and your file may be corrupt.
  • cvj_obj (CVJ) – (Default value = None) This argument is for a CVJ object that has already been loaded with a json path or json data. If you need to know what the CVJ object is read “Introduction to the CVJ”
  • json_data (dictionary) – (Default value = None) This value is used to transfer the negative images over to a dictionary that is already COCO formatted. So if the user calling this method has loaded a json file already
  • json_path (string) – (Default value = None) This is a path to a COCO formatted json file. In this method it is used to create a CVJ object
  • save (bool) – (Default value = None) This argument is used to save the internal json at the json path supplied to the object that has called this method. It is defaulted to False becuase it could take a while to save. This is up for the user to decide.
Returns:

  • CVJ (CVJ) – The first return value is “cvj_obj” which is a CVJ object. This holds the transferred images now and will need to be saved by the user. If a json path was supplied here, upon returning the user can call teh “save_internal_json()” and it will save it where the json file is. If needing to understand what the CVJ object is refer to “Introduction to the CVJ”.
  • list (list) – The second return value is “imgs” which is a list of the images that have been transferred to the supplied json data.

update_images(list_of_paths, remove=False)

This method updates the image annotations within the json_data that is stored within this CVJ object

Parameters:list_of_paths (list) – This parameter is a list of paths to the images that the user is wanting to input in to the internal json data.
xywh_to_xyxy(bboxes)

This method converts the bounding boxes of a numpy array in the format [[x, y, width, height]] to the format [[x1, y1, x2, y2]]

Parameters:bboxes (numpy array) – This is the numpy array for bounding boxes in the format of [[x, y, width, height]]
Returns:numpy array
  • This is the numpy array for bounding boxes in the format of [[x1, y1, x2, y2]]
Return type:numpy array

Example

This code takes the bounding boxes in the form of a numpy array with the format x, y, width, and height. For example

x       = bboxes[:,0] #is all of the x's in the array
y       = bboxes[:,1] #is all of the y's in the array
widths  = bboxes[:,2] #is all of the w's in the array
heights = bboxes[:,3] #is all of the h's in the array

Directly below is somewhat how your array will have to look like

[[ 24, 25, 4, 5],
[50, 50, 7, 6]
[....],
[....]]
xyxy_to_xywh(xyxy)

This method converts the bounding boxes of a numpy array in the format [[x1, y1, x2, y2]] to the format [[x, y, width, height]]

Parameters:xyxy (numpy array) – This is the numpy array for bounding boxes in the format of [[x1, y1, x2, y2]]
Returns:numpy array
  • This is the numpy array for bounding boxes in the format of [[x, y, width, height]]
Return type:numpy array