Using iOS CreateML

Using CreateML#

Background#

The business requirement is to identify the quantity or type of specified objects in photos taken. Since there are no pre-trained models available online for these objects, we need to start from scratch. Therefore, I researched Apple's CreateML implementation, and the specific operations are as follows:

The requirement is: to identify the quantity of specified objects in photos taken, and there are several implementation options:

Use a third-party platform to train data, generate models, and provide them for frontend use.
Build your own platform to train data, generate models, and provide them for frontend use.
Use Apple's CreateML tool to train data, generate models, for iOS use or convert to other models.

Comparing these options, using Apple's CreateML tool can save the process of building a platform. Let's see how to use CreateML.

The overall process of using CreateML is:

Have a large number of samples.
Annotate all samples.
Train a model using these samples.
Validate the model's recognition rate.
Test the model's effectiveness.
Export the model for use.

The requirement is to identify the quantity of specified objects in photos taken, so for me, the samples are the photos. Let's see how to generate the annotation information required for CreateML training.

Usage#

Sample Photo Annotation#

First, we need a large number of sample photos. Here, due to research testing, I selected 20 photos, sourced from Baidu Images... The troublesome part is photo annotation, as Apple's CreateML training requires a specified format JSON file, formatted as follows:


[
    {
        "image": "image_name.jpg",
        "annotations": [
            {
                "label": "label_name",
                "coordinates": {
                    "x": 133,
                    "y": 240.5,
                    "width": 113.5,
                    "height": 185
                }
            }
        ]
    }
 ]

Therefore, finding an annotation tool that can directly export JSON format supported by Create ML is especially important. Here, I refer to Object Detection with Create ML: images and dataset and use roboflow for annotation, with specific usage as follows:

After logging into roboflow, there will be guided steps to create a workspace, create a new project under the workspace, and invite users to join the same workspace. The interface is as follows:

Click Create Project, enter the project name, select the project License, choose the project functionality, and enter the name of the object to be labeled, as shown below:

Then import photos, for example, download a dozen photos of steel pipes from the internet to train the recognition of steel pipes (usually, there should be many more photos; the more samples, the more accurate the trained model, but this is just for demonstration purposes, so not many photos are selected); note that after the photos are uploaded, do not click the Finish Uploading in the upper right corner, as there are no annotations yet. Note All Images(18), Annotated(0), indicating that none have been annotated.

Double-click any photo to enter the annotation page,

On the annotation page, first select the annotation tool on the far right, which defaults to rectangular box annotation;
Then select the annotation range, noting that when using curve annotation, the starting and ending points must be closed;
Then in the pop-up Annotation Editor box, enter the label to be marked, and click save;
If there are multiple labels, repeat the above process;
Once all labels are completed, click the return button on the left;

After all photos are annotated, click Finish Uploading, and a prompt box will pop up, allowing you to choose how to allocate photos to Train, Valid, and Test. By default, all photos are used for Train, as shown below:

Click Continue to proceed to the next step. You can see that steps 1 and 2 are completed. If you want to edit the above two steps, you can hover over the right side of the corresponding module to see the edit button; currently, in step 3, Auto-Orient means Discard EXIF rotations and standardize pixel ordering. If not needed, this step can be deleted; Resize is to resize the photos. You can also add a Preprocessing Step. The purpose of this preprocessing step is to save the final training time.

Click Continue to enter step 4, which aims to create new training examples for the model to learn by generating augmented versions of each image in the training set. If not needed, you can directly click Continue to proceed to the next step.

The final step is to generate results. Click Generate and wait for the results to appear. The results are as follows:

From the above image, you can see the number of all annotated photos, Train, Validation, and Test, and the Start Training button is the online training function provided by the website, which is not needed here; what is needed is to export the CreateML type annotation information, so click Export, and in the pop-up window, select the format to export. This website is best for exporting many supported formats; here, select CreateML, then check download zip to computer, and click Continue.

In Finder, find the downloaded xxx.createml.zip, unzip it, and check the folder. The contents are as follows, and you can see that Train, Valid, and Test are all categorized. Check the _annotations.createml.json under the Train folder, and you can see that the format is consistent with what CreateML requires and can be used directly.

Using CreateML#

Select Xcode -> Open Developer Tool -> Create ML to open the CreateML tool, as shown below:

Then enter the project selection interface. If there is an existing CreateML project, select to open it; if not, select the folder to save, and then click the New Document button in the lower left corner to enter the creation page. The creation page is as follows:

Choose the template to create as needed; here, select Object Detection, then enter the project name and description, as shown:

Here, a side note: seeing Text Classification and Word Tagging reminds me of SMS filtering apps in stores, which likely use a large number of spam messages, then train a model using CreateML, and import it into the project, continuously updating this model with Core ML to achieve better recognition effects for each individual.

Click Next to enter the main interface, as shown in the following image. You can see the following parts:

The Setting, Training, Evaluation, Preview, and Output at the top center represent the steps of CreateML, which can be switched to view. Currently, the latter steps are empty because there are no training results yet.
The Train button on the top left starts training after importing data in the middle section TrainingData.
The Data part can use the contents from the downloaded folder, but first, you need to import the Train file, and after the training results come out, you can continue with the subsequent steps.
Parameters part
- The Algorithm is to choose the algorithm; the default FullNetworking is based on YOLOv2, and you can choose TransferLearning, which is related to the number of samples, and the trained model is also smaller. As for what YOLOv2 is, you can refer to this article A Gentle Introduction to Object Recognition With Deep Learning, which will give you a general understanding of several algorithms in deep learning.
- Iterations is the number of training iterations; it is not necessarily better the larger it is, nor should it be too small.
- Batch Size is the size of the data for each iteration of training; the larger the value, the more memory it occupies. Refer to What is batch size in neural network?
- Grid Size, refer to gridsize, which indicates how many blocks the YOLO algorithm divides the image into.

Here, set Iterations to 100, keep the rest as default, and then click Train, which will enter the Training step, with the final result as follows:

Then select Testing Data. Since there are only two data points in the downloaded annotations under Test, you can continue to use Train as the Testing Data, click Test, and wait for the results to appear, as shown:

The meanings of I/U 50% and Varied I/U on this page can refer to CreateML object detection evaluation

I/U means "Intersection over Union". It is a very simple metric which tells you how good the model is at predicting a bounding box of an object. You can read more about I/U in the "Intersection over Union (IoU) for object detection" article at PyImageSearch.
So:
I/U 50% means how many observations have their I/U score over 50%
Varied I/U is an average I/U score for a data set

You can also view recognition effects more intuitively through Preview. Click to add files and select all steel pipe photos.

Then check each one to see if it can be recognized.

Finally, switch to Output, click Get, and export the model xxx.mlmodel, as shown:

Using the trained model mlmodel#

Create a new project and import the trained model into the project, as shown:

Then import the Vision library, load a photo, and generate VNImageRequestHandler, using the generated handler for recognition. The code is as follows:


class PipeImageDetectorVC: UIViewController {


    // MARK: - properties
    fileprivate var coreMLRequest: VNCoreMLRequest?
    fileprivate var drawingBoxesView: DrawingBoxesView?
    fileprivate var displayImageView: UIImageView = UIImageView()
    fileprivate var randomLoadBtn: UIButton = UIButton(type: .custom)
    
    // MARK: - view life cycle
    override func viewDidLoad() {
        super.viewDidLoad()

        // Do any additional setup after loading the view.
        setupDisplayImageView()
        setupCoreMLRequest()
        setupBoxesView()
        setupRandomLoadBtn()
    }
    
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
    }
    
    // MARK: - init
    fileprivate func setupDisplayImageView() {
        view.addSubview(displayImageView)
        displayImageView.contentMode = .scaleAspectFit
        displayImageView.snp.makeConstraints { make in
            make.center.equalTo(view.snp.center)
        }
    }
    
    fileprivate func setupCoreMLRequest() {
        guard let model = try? PipeObjectDetector(configuration: MLModelConfiguration()).model,
              let visionModel = try? VNCoreMLModel(for: model) else {
            return
        }
        
        coreMLRequest = VNCoreMLRequest(model: visionModel, completionHandler: { [weak self] (request, error) in
            self?.handleVMRequestDidComplete(request, error: error)
        })
        coreMLRequest?.imageCropAndScaleOption = .centerCrop
    }
    
    fileprivate func setupBoxesView() {
        let drawingBoxesView = DrawingBoxesView()
        drawingBoxesView.frame = displayImageView.frame
        displayImageView.addSubview(drawingBoxesView)
        drawingBoxesView.snp.makeConstraints { make in
            make.edges.equalToSuperview()
        }
        
        self.drawingBoxesView = drawingBoxesView
    }
    
    fileprivate func setupRandomLoadBtn() {
        randomLoadBtn.setTitle("Load a random image", for: .normal)
        randomLoadBtn.setTitleColor(UIColor.blue, for: .normal)
        view.addSubview(randomLoadBtn)
        let screenW = UIScreen.main.bounds.width
        let screeH = UIScreen.main.bounds.height
        let btnH = 52.0
        let btnW = 200.0
        randomLoadBtn.frame = CGRect(x: (screenW - btnW) / 2.0, y: screeH - btnH - 10.0, width: btnW, height: btnH)
        
        randomLoadBtn.addTarget(self, action: #selector(handleRandomLoad), for: .touchUpInside)
    }
    
    // MARK: - utils
    
    
    // MARK: - action
    fileprivate func handleVMRequestDidComplete(_ request: VNRequest, error: Error?) {
        let results = request.results as? [VNRecognizedObjectObservation]
        DispatchQueue.main.async {
            if let prediction = results?.first {
                self.drawingBoxesView?.drawBox(with: [prediction])
            } else {
                self.drawingBoxesView?.removeBox()
            }
        }
    }
    
    @objc fileprivate func handleRandomLoad() {
        let imageName = randomImageName()
        if let image = UIImage(named: imageName),
            let cgImage = image.cgImage,
            let request = coreMLRequest {
            displayImageView.image = image
            let handler = VNImageRequestHandler(cgImage: cgImage)
            try? handler.perform([request])
        }
    }
    
    // MARK: - other
    fileprivate func randomImageName() -> String {
        let maxNum: UInt32 = 18
        let minNum: UInt32 = 1
        let randomNum = arc4random_uniform(maxNum - minNum) + minNum
        let imageName = "images-\(randomNum).jpeg"
        return imageName
    }
}

The final effect is as follows, and the complete code has been placed on GitHub: CreateMLDemo

Summary#

This article introduces the overall process of CreateML, from model annotation to training and generating models, and finally to using the models. I hope everyone can have a general understanding of CreateML. Later, I will research whether the models imported into the project can be updated, how to update them, and attempt to create my own SMS filtering app.

Using CreateML#

Background#

Usage#

Sample Photo Annotation#

Using CreateML#

Using the trained model mlmodel#

Summary#

References#