Orthogonalization
Break down complex problems into manageable, independent components. It ensures that each issue in an ML system is addressed separately, minimizing overlap and confusion in problem-solving. The principle of orthogonalization works by aligning model performance to specific goals, each tackled in a stepwise manner:
Fit the training set well on the cost function
Ensure the model is complex enough and trained sufficiently to achieve low training error.If this fails, adjust factors like model architecture, training duration, or optimization techniques.
Perform well on the dev set If the model performs poorly on the dev set, but not the training set, it indicates overfitting. Address this with techniques like regularization, dropout, or data augmentation.
Perform well on the test set If the test set performance is poor relative to the dev set, investigate data mismatch between training/dev and test distributions. Solutions might include collecting more representative data or fine-tuning with test-like data.
Perform well in the real world Sometimes, even a good test set performance doesn’t translate to real-world success. This might require redefining metrics, retraining on updated datasets, or deploying additional error analysis.
Define satisficing and optimizing metrics upfront
The optimizing metric is the primary metric you are trying to maximize or minimize. It represents the main objective of the ML system and is often the number you care about improving the most.
Examples: Accuracy, F1-score, ROC-AUC, Mean Squared Error (MSE), etc
A satisficing metric is a secondary constraint or requirement that the model must meet. Unlike the optimizing metric, the goal is not to maximize or minimize it but to ensure it stays above or below a specified threshold.
Examples: Inference latency, memory consumption, or false-positive rates.
While optimizing metrics drive the core model improvements, satisficing metrics ensure the system operates within acceptable real-world constraints. Neglecting them can lead to inefficient, unethical, or unusable ML systems in real-life, even if the optimizing metric is excellent.
Defining and comparing to human-level performance
Every task has a Bayes error, the theoretical minimum error achievable for a task. Human-level performance serves as a natural benchmark for tasks that humans already do well, such as image recognition, natural language processing, or medical diagnosis. For such tasks human-level error often acts as a proxy for Bayes error.
In some domains, such as logistics optimization or large-scale recommendation systems, ML models can significantly outperform humans due to their ability to process vast datasets quickly and consistently. Defining human-level performance helps quantify the gap between human capability and ML potential, driving innovation and showcasing the value of ML solutions.
If you're ML practitioner, who wants to go beyond models in Jupyter Notebooks, check out my short course "Build ML-based service on Azure from scratch", where just for 12€ you can learn how to build an ML-based application with FastAPI and Streamlit UI and deploy it on Azure.
Comments