SMS Filtering APP Development#
This article is published on Sohu Technology Products - SMS Filtering APP Development
I have always wanted to develop my own SMS filtering APP, but I never implemented it concretely. Now I finally calm down and record the overall development process while developing.
Spam SMS Samples#
The first problem encountered is that since we want to filter spam SMS, we first need to identify which ones are spam SMS. How to identify them?
Referring to previous experience in training to identify steel pipe counts, I decided to train a Text model using CoreML for recognition. The question then arises: where do we get the SMS dataset to train the model?
Initially, I planned to find spam SMS samples online, but after searching for a long time, I couldn't find any. So I thought of using the SMS from my and my family's phones. After all, SMS on phones are generally not deleted, and there are several thousand messages, including spam SMS, promotions, advertisements, and so on.
So the question became, how to export SMS from iPhone?
I also searched for a long time and found that most third-party software requires payment. Eventually, I discovered a free export solution.
First, back up the phone to the computer without encryption. As shown in the figure below, select Back up all the data on your iPhone to this Mac
, click Back Up Now
, and wait for the backup to complete. After the backup is complete, click Manage Backups
.
After clicking Manage Backups
, the interface is as follows. You can see the backup records. Right-click and select Show In Finder
to open it in the folder.
Then you can see that the backup directory has been opened. At this point, you need to find the file named 3d0d7e5fb2ce288813306e4d4636395e047a3d28
. This file is the database file for the SMS backup. Then the question arises: how to find it? Seeing one folder after another in the backup directory can be confusing. It's simple: search. Click the search button in the upper right corner and directly enter this file name. Note that the search range is the current folder.
The search results are as follows:
Then copy this file to another location, such as the desktop, and open it with database software, such as SQLPro for SQLLite
, as shown below:
After observing this file, I found that the phone numbers and SMS records are distributed across different tables. I need to write an SQL query to extract the required content. The SQL content is as follows, referring to SQL to extract messages from backup. Select Query
in the above image and enter the command as follows:
SELECT datetime(message.date, 'unixepoch', '+31 years', '-6 hours') as Timestamp, handle.id, message.text,
case when message.is_from_me then 'From me' else 'To me' end as Sender
FROM message, handle WHERE message.handle_id = handle.ROWID AND message.text NOT NULL;
Then click execute in the upper right corner, and you can see that all the SMS have been filtered out.
Then select all rows, right-click and choose Export result set as
to export as CSV
, thus exporting an Excel format file.
This way, the required SMS samples have been obtained.
Training Recognition for Spam SMS#
With the samples in hand, let's look at how to train for recognition. I plan to use Apple's CoreML for recognition. So how to use it? What are the format requirements for the samples? How long does training take?
First, create a text training CoreML
project. Select Xcode, click Open Developer Tool
, and choose CoreML
to open it, as shown in the figure below:
Then select a folder and click New Document
, as shown below:
Then select Text Classification
, as shown in the figure below:
Next, enter the project name and description.
Click create in the lower right corner to enter the main interface, as shown below:
Click on the detailed description of Training Data
, and you can see the format required by CoreML
for text recognition, supporting JSON
and CSV
files, as shown below:
The JSON format is as follows:
// JSON file
[
{
"text": "The movie was fantastic!",
"label": "positive"
}, {
"text": "Very boring. Fell asleep.",
"label": "negative"
}, {
"text": "It was just OK.",
"label": "neutral"
} ...
]
The CSV format consists of one column for text
and one column for label
:
text | label |
---|---|
This is a regular SMS | label1 |
This is a spam SMS | label2 |
Since in the previous step, the SMS has already been exported in CSV format, we just need to change the format to that shown in the image above. There is only one problem left to solve: what are the possible values for the label?
To see what values the label can take, we need to first look at what the system's SMS filtering logic is like. What filtering categories are supported? Otherwise, if the categories I want to implement are grouped, I might find out that the system does not support them, which would be awkward.
SMS Filtering Categories#
System SMS Filtering Logic#
Referring to SMS and MMS Message Filtering, it can be seen that developers do not have the authority to create new groups. They can only intercept and return specified categories for SMS
or MMS
received from unknown contacts.
It should be noted that according to the documentation, SMS filtering does not support filtering iMessages or SMS from contacts in the address book, only supporting SMS
and MMS
from unknown contacts.
SMS filtering is further divided into local judgment filtering and server-side judgment filtering, as illustrated below:
According to the documentation, even for server-side filtering, the APP cannot directly access the network. The system will interact with the configured server; moreover, the App Extension cannot write data through the shared Group, so SMS can only be obtained in the App Extension, cannot be stored, and cannot be uploaded, thus ensuring privacy and security. For more implementations of server-side filtering, refer to Creating a Message Filter App Extension.
Next, let's look at the supported filtering types, ILMessageFilterAction
.
The major categories support five types:
- none
Not enough information to judge, will display information, or further request server-side judgment filtering. - allow
Normally displays information. - junk
Prevents normal display of information, displayed under the junk SMS category. - promotion
Prevents normal display of information, displayed under the promotional information category. - transaction
Prevents normal display of information, displayed under the transaction information category.
Among these, there are also subcategories, ILMessageFilterSubAction
. For specific meanings, refer to ILMessageFilterSubAction.
- none
- The supported subcategories for promotion include:
- others
- offers
- coupons
- The supported subcategories for transaction include:
- others
- finance
- orders
- reminders
- health
- weather
- carrier
- rewards
- publicServices
Here, we only handle the major categories, and do not filter the specific subcategories in detail. Therefore, the values for the labels that need to be trained are very clear: filtering spam SMS, promotional information, and transaction information. As for none and allow, they are not distinguished and are uniformly processed as allow. Therefore, the total values for the labels that need to be trained are as follows:
- allow
- junk
- promotion
- transaction
Next, for the exported SMS CSV
file, we need to add the corresponding label for each SMS. This can only be done manually. The size of the sample and the definition of the labels determine the accuracy of subsequent recognition. At the same time, for the implementation of subsequent subcategories, it is recommended to be realistic and not to misclassify, for example, putting some in junk that belong in promotion...
Once each SMS sample is labeled, it can be imported into Create ML
for training to generate the required model. The steps are as follows:
First, import the dataset.
Then click Train
in the upper left corner.
Once training is complete, you can click Preview to simulate SMS text and see the predicted output, as shown in the figure below:
Finally, export the model for APP use.
APP Development#
Create a new project, then use new bing to generate images to design the APP Icon, and then use ChatGPT-4 to generate the APP name. Then add the Message Filter Extension
Target, as shown in the figure below:
In MessageFilterExtension.swift
, you can see that Apple has already implemented the basic framework. You only need to add the corresponding filtering logic in the relevant // TODO: places.
Then import the training result set into the project. Note that the Target should be checked for both the main project and the Message Filter Extension
Target, as the model needs to be used in this Target for filtering.
The specific usage is as follows:
import Foundation
import IdentityLookup
import CoreML
import IdentityLookup
enum SMSFilterActionType: String {
case transaction
case promotion
case allow
case junk
func formatFilterAction() -> ILMessageFilterAction {
switch self {
case .transaction:
return ILMessageFilterAction.transaction
case .promotion:
return ILMessageFilterAction.promotion
case .allow:
return ILMessageFilterAction.allow
case .junk:
return ILMessageFilterAction.junk
}
}
}
struct SMSFilterUtil {
static func filter(with messageBody: String) -> ILMessageFilterAction {
var filterAction: ILMessageFilterAction = .none
let configuration = MLModelConfiguration()
do {
let model = try SmsClassifier(configuration: configuration)
let resultLabel = try model.prediction(text: messageBody).label
if let resultFilterAction = SMSFilterActionType(rawValue: resultLabel)?.formatFilterAction() {
filterAction = resultFilterAction
}
} catch {
print(error)
}
return filterAction
}
}
Then in MessageFilterExtension.Swift
, call the offlineAction(for queryRequest: ILMessageFilterQueryRequest)
method as follows:
@available(iOSApplicationExtension 16.0, *)
private func offlineAction(for queryRequest: ILMessageFilterQueryRequest) -> (ILMessageFilterAction, ILMessageFilterSubAction) {
guard let messageBody = queryRequest.messageBody else {
return (.none, .none)
}
let action = MWSMSFilterUtil.filter(with: messageBody)
return (action, .none)
}
It should be noted that the minimum version setting for the APP is that ILMessageFilterSubAction
is only supported on iOS 16 and above, while ILMessageFilterSubAction
is supported on iOS 14 and above.
If you want to implement more refined SubAction
filtering, then the labels of the SMS dataset above need to be changed to more refined labels, and then a model needs to be trained to make judgments.
Additionally, ILMessageFilterQueryRequest
can obtain sender
and messageBody
, so if you want to implement custom rules, for example, setting corresponding rules for a specific phone number, you need to set the corresponding rules from the APP and then share them to the Extension through Group, and then match the rules in the above method.
Summary#
I believe that through the above steps, everyone can develop their own SMS filtering APP.
The above steps are based on a fixed training model to match the logic. The steps are:
- Obtain the SMS dataset
- Use CoreML to train the dataset and generate the model
- Use the model in the project for judgment
The model generated by this method has fixed data, and each update of the model requires retraining and importing, followed by updating the APP. Is there a better way?
For example, can we train and update in the APP? Or can we use a combination of local rules, local models, and network models?
Assuming Solution One:
First, in the APP, train and update simultaneously. The general idea is as follows:
To update the model, you need to know the content of a piece of data and the classification of the data. Therefore, if you want to train the model in the APP, you need to obtain the classification through another method. Otherwise, using the model to obtain the classification and then going back to train the model is not very meaningful. Therefore, obtaining data classification through custom rules and then using the data and its classification to update the model should be feasible.
Assuming Solution Two:
Then consider a more complete approach, which is to use a combination of local rules, local models, and network models:
The logic is to first match using local rules. If local rules do not match, continue to use the local model for matching. If the local model also does not match, then request the server, which has a continuously trained and updated model to obtain the corresponding classification. Finally, each time an update occurs, the current latest model from the server is updated into the project.
Assuming Solution Three:
Solution two requires a network model, assuming that the premise is that the server has a continuously trained and updated model. What if this assumption does not exist? If there are only local rules and local models, along with occasionally obtained updated datasets, is there a way to update the local model online?
Currently, the local model is directly added to the APP main Bundle. It can be considered to copy it to the shared Group of the APP and Extension during the first launch. Each time the APP is opened, check if the model has been updated. If there is an update, download and replace the model file in this directory. In the Extension, the model file in this directory can be accessed via URL for filtering.
The flowcharts for several solutions are as follows:
Summary as follows: