Reading
- balanced classification: number of examples in each class is roughly equal -> receiver operating characteristic (ROC) curve
- class-imbalanced problems: number of examples in a class is much larger than others (e.g, fraud transaction, only 1% is fraud), ranking problems, or multilabel classification -> precision and recall, as well as a weighted form of accuracy or ROC AUC.
Develop the Model
1. Vectorization (to tensor of float32)
2. Normalization (0-1)
3. Validation
4.
a. K-fold cross-validation -> when you have too few samples (you reshuffle the validation set and retrain.
b.Iterated K-fold validation -> performing highly accurate model evaluation when little data is available
Note:
- Take small values (0-1)
- Homogenous (roughly similar range)
- Consider replacing missing value in a feature with median/average instead of 0 to avoid discontinuity
- Overfit first to the training set, evaluate validation data, modify model, retrain, evaluate validation data, repeat
- Overfit by add layers, make layers bigger, train for more epochs
- Be mindful! Every time we revalidate with validation evaluation, it LEAKS info from the validation process to the model. Don’t do it too many times! If so, it may cause overeat to the validation process. If this happens, you may want to switch to K-fold validation
Variables to take note:
- Feature engineering -> feature selection
- Architecture -> what model are you using?
- Training configuration -> loss function? Batch size? Learning rate?
Deploy the Model
False negative -> e.g valid transactions marked as fraud
False positive -> e.gfraudulent transactions that are missed
Methods:
- REST API: be mindful that app doesn’t have strict latency requirement (~500ms), input data sent isn’tsensitive (it will be decrypted form on server to be seen by the model)
- Install TensorFlow on a server/cloud (build with Flask / Python web dev library / Django / TF Serving)
- Query model via REST API
- On a device: CPU, microcontroller, model needs to run on low connectivity environment, strict latency constraints, small size / memory constraint, no need to be highly accurate, input data is sensitive
- Deploy with TF Lite
- Browser / JS apps (directly): offload compute to the client, reduce server cost, input data needs to stay on client (sensitive), strict latency constraints, low connectivity, model is small (similar to on device but on browser/js apps)
Inference Model Optimization
- Weight pruning: reduce memory because you reduce the number of parameters in the layers of the model
- Weight quantization: change from the float32 weight to int8 (smaller by quarter the size but remains accurate!)