Introduction to the CVJ

This is the beginning documentation of a python project known as CVJSON. The project is basically an extension module for python dictionaries that are handling COCO formatted data.

Why Use This Library

There are a significant amount of reasons for using the library. However I am only going to talk about the 2 main ones. Reducing the redundant utility scripting and this is a way to have everyone benefit from each others utility programming.

If you needed more functionality and wrote a piece of code for it, add it to this library and then everyone else can benefit also.

COCO Format Needed Don’t Skip This Section

The COCO format has different formats for different types of detections. We are looking at the object detection format for most of this library’s functionality. Even though this library supports COCO, it is a partial support since the tracking is only done for portions of the format. The next section describes how to format the json

The basic template portion of COCO is like this

{
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
}

Then from here there is further formatting. An “annotation” needs to be in the below format which is part of the “annotations” list in the above example. Then categories list needs to be added to the above format.

From COCO http://cocodataset.org/#format-data

annotation  : {
              "id": int,
              "image_id": int,
              "category_id": int,
              "segmentation": RLE or [polygon],
              "area": float,
              "bbox": [x,y,width,height],
              "iscrowd": 0 or 1,
              }

categories  : [{
                "id": int,
                "name": str,
                "supercategory": str,
                 }],

Main Structure

Which then leaves us with the main structure below. Since we have just added the “categories” list to the format.

{
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
"categories": [category]
}

However, this library is pretty much just using and checking for the part of the json that is like below, if it is like the one above, that is fine, the library should be able to handle it.

{
"images" : [image],
"annotations" : [annotation],
"categories": [category]
}

Remember though that the json needs to have the annotation’s in the format like below.

annotation  : {
              "id": int,
              "image_id": int,
              "category_id": int,
              "segmentation": RLE or [polygon],
              "area": float,
              "bbox": [x,y,width,height],
              "iscrowd": 0 or 1,
              }

Also each image annotation must have at least “id”, “width”, “height”, and “file_name” from the structure below. Think that every image annotation that is in the “images” list that is a part of the Main Structure

image: {
        "id" : int,
        "width" : int,
        "height" : int,
        "file_name" : str,
        "license" : int,
        "flickr_url" : str,
        "coco_url" : str,
        "date_captured" : datetime,
    }

The library does not support the “license”, “flickr_url”, “coco_url”, or “date_captured” keys. If creating a json with this library or adding annotations it will be in the format below for image annoations.

image: {
        "id" : int,
        "width" : int,
        "height" : int,
        "file_name" : str
    }

Quick Start

Go to here and download the git repo. Afterwards just change directory to the newly downloaded cvjson repo and type “pip install”.

So a walk through of those steps would be like below:

git clone https://bitbucket.org/mayachitrainc/cvjson/src/master/
pip install cvjson/

Now if you plan to use the Painter extension then there is another program to install. That program is GMIC, which is a powerful plugin used in the GIMP editing software.

sudo apt-get install gmic

Now for further details go to The Basics to dive right in.

The Basics

The basic object is a CVJ (Computer Vision JSON) which holds the json data, path, etc.

Below is how to get the average area of each class and plot it using seaborn. With three lines of code you now have more insights on object data.

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

cvj_object = CVJ(json_path)
classes = cvj_object.get_average_area_by_class(show_plot=True)

An example of how one can get the class distribution would be:

from cvjson.cvj import CVJ

json_path = "/home/typhon/Downloads/full_xview_coco_.json"

cvj_obj = CVJ(json_path=json_path)
cvj_obj.get_distribution_of_class_id(show_plot=True)

The above example returns a list of class id’s that are not unique. But the real part is that this will use seaborn in the backend and generate a histogram based on that list. The list is just returned in case you want to plot it differently.

CVJ object is a Dictionary!?

A cool feature is that you can use this object just like a normal dictionary that you loaded your json file from.

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

cvj_object = CVJ(json_path)
annotations = cvj_object["annotations"]

All of the functions you would find in a dictionary is included with the object. Mainly because it is using a dictionary to hold the information and not just the __dict__ that represents the object. If needing to understand what a __dict__ is for an python object, here is a link that gives a good idea.

You can also iterate through the object just like you would if you loaded the json in to a python dictionary.

Iterating the CVJ Object!?

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

cvj_obj = CVJ(json_path)

for key in cvj_object:
    print(key)

The output for that should be “images”, “categories”, and “annoations” if the supplied json is correct, which it should be since there are checks in place to see if it is a COCO formatted json. For now only COCO is supported, but the idea to convert other JSON formats dealing with computer vision have been an idea. Most likely if that were to be supported it would take in the other JSON type and convert it to COCO and you would interact with it like a COCO formatted dictionary.

Basic Dictionary Subsets of the JSON Data

So the CVJ object creates dictionaries to quickly access data and a lot of the time these dictionaries will need to be used.

Below is a coded example of all of the dictionaries that are regularly used and how to get them.

"""
These aren't real enums, but class variables used like enums

Dictionary enums:
                * IMID_2_ANNS , Image ID to Annoations
                * CLID_2_NAME, Class ID to Class Name
                * CLNAME_2_CLID, Class name to Class ID
                * IMID_2_FNAME, Image ID to File Name
                * FNAME_2_IMID, File Name to Image ID
                * IMID_2_FPATH, Image ID to File Path
                * IMID_2_IMATTR, Image Id to Image Attributes
                * CLID_2_ANNS, Class ID to Annotations

"""

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

cvj_object = CVJ(json_path)

image_id_2_anns = cvj_obj.get_dictionary(CVJ.IMID_2_ANNS)
class_id_2_class_name = cvj_obj.get_dictionary(CVJ.CLID_2_NAME)
class_name_2_class_id = cvj_obj.get_dictionary(CVJ.CLNAME_2_CLID)
image_id_2_file_name = cvj_obj.get_dictionary(CVJ.IMID_2_FNAME)
file_name_2_image_id = cvj_obj.get_dictionary(CVJ.FNAME_2_IMID)
image_id_2_file_path = cvj_obj.get_dictionary(CVJ.IMID_2_FPATH)
image_id_2_image_attributes = cvj_obj.get_dictionary(CVJ.IMID_2_IMATTR)
class_id_2_annotations = cvj_obj.get_dictionary(CVJ.CLID_2_ANNS)

One can use the getter methods that are similar to these dictionaries and grab the info like a normal, but the above dictionaries can also be used just in case.

Basic Data Insights

Each of the below are basic data insight methods that by defualt plot information out using seaborn in the backend. Well, once you have loaded a correctly formatted JSON as described at the first section found here -> COCO Format Needed Don’t Skip This Section

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

cvj_object = CVJ(json_path)

cvj_obj.get_class_count_by_image_id(some_image_id)
cvj_object.get_distribution_of_class_id()
cvj_object.get_distribution_of_area()
cvj_object.get_average_area_by_class()
cvj_object.get_bbox_count_by_class_dict()
cvj_object.get_average_side_lengths()

The Convenience and the Traps

The getter methods can return many types of values depending on what get method is called. The first example is the normal way one would use a getter method.

Example 1

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"
# The image ID is arbitrary in this example
img_id = 1

# This grabs the annotations associated with that image id
annoations_for_img_id_one = cvj_obj.get_image_id_2_anns(img_id)

Example 1 shows the normal use of the get method. The user puts in the image ID for which the method returns the annoations for that particular image.

The return will change depending on what is or is not given. The example below shows that the user will get the dictionary that is used to find the annotations in Example 1 if there are no arguments given.

Example 2

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

# The image ID is arbitrary in this example
img_id = 1

cvj_obj = CVJ(json_path)

# This grabs the dictionary from image ID to annoations from the data that was gathered
# from the JSON file loaded in to the object during instantiation
image_id_2_anns_dict= cvj_obj.get_image_id_2_anns()

So since the user now has the dictionary from the method they can use it like the following.

from cvjson.cvj import CVJ

json_path = "the/path/to/your/json"

# The image ID is arbitrary in this example
img_id = 1

cvj_obj = CVJ(json_path)

# This grabs the dictionary from image ID to annoations from the data that was gathered
# from the JSON file loaded in to the object during instantiation
image_id_2_anns_dict= cvj_obj.get_image_id_2_anns()

annotions_for_image_id_1 = image_id_2_anns_dict[img_id]

Now the functionality of the same method can change even further on if some other json data that is supplied to the method. Sometimes a need arises where one has manipulated the data in a new json-type dictionary and needs information from that dictionary. This functionality is used within the Cropper extension/subclass.

RULE : If you supply this method with JSON data using the json_data variable, everything returned will always be from the newly supplied data

Example 3

from cvjson.cvj import CVJ
import json

json_path = "the/path/to/your/json"
some_other_json = "path/to/another/json"

with open(some_other_json, 'r') as file:
    other_json_data = json.load(file)

# The image ID is arbitrary in this example
img_id = 1

cvj_obj = CVJ(json_path)

# This grabs the annotations associated with that image id from the supplied json data
annoations_for_img_id_one_from_other_json_data = cvj_obj.get_image_id_2_anns(img_id=img_id, json_data=other_json_data)

What this is doing now is grabbing the data from the variable “other_json_data” and is searching for the annotations that are associated with the image id also supplied. By default this is not how the method works obviously because it is not intuitive, but the need does arise where this functionality is preferable.

Below shows that the user will get the dictionary that has keys as image IDs and values of annotations from the json data supplied to the json_data variable.

Example 4

from cvjson.cvj import CVJ
import json

json_path = "the/path/to/your/json"
some_other_json = "path/to/another/json"

with open(some_other_json, 'r') as file:
    other_json_data = json.load(file)

# The image ID is arbitrary in this example
img_id = 1

cvj_obj = CVJ(json_path)

# This grabs the dictionary from image ID to annoations for the supplied json data
image_id_2_anns_dict_for_some_other_json = cvj_obj.get_image_id_2_anns(json_data=other_json_data)

So the above got the dictionary for the json_data that was supplied to the method rather than the json_data that is within the CVJ object.

Now that that is settled there are more dictionary gathering methods that just that previous one. These are all used to quickly find the data in the JSON that was supplied.

from cvjson.cvj import CVJ

json_path = "/home/typhon/Downloads/full_xview_coco_.json"

cvj_object = CVJ(json_path)

"""
These get methods below are not used in the traditional sense right now.
They are returning dictionaries and do not require arguments.

That is the Trap in this library
"""
file_name_2_img_id_dict = cvj_obj.get_filename_2_image_id()
image_id_2_filename_dict = cvj_obj.get_image_id_2_filename()
image_id_2_annotations_dict = cvj_obj.get_image_id_2_anns()
image_id_2_image_attributes_dict = cvj_obj.get_image_id_2_image_attribs()

class_name_2_id_dict = cvj_obj.get_class_name_2_id()
class_id_2_name_dict = cvj_obj.get_class_id_2_name()

image_folder_path = "/home/typhon/Downloads/full_view_test"

# To use get_image_id_2_filepath() the image folder path variable
# must be set to the images associated with the JSON that has been supplied to the
# CVJ object.
cvj_obj.image_folder_path = image_folder_path
image_id_2_filepath_dict = cvj_obj.get_image_id_2_filepath()

# this one is not a dictionary that gets returned it is a list
image_ids = cvj_obj.get_image_ids()