Skip to main content

Professional Machine Learning Engineer

🏆 Passed: April 18, 2025|☔️ Failed: March 30, 2025

Exam Information

Exam Name: Professional Machine Learning Engineer

  • Date: 18 April 2025
  • Time: 01:30 PM

Post-Exam Reflection Memo_2025/04/19

Exam Information / Impressions:

- 50 questions, 120 minutes, English
- The exam felt easier the second time around, but it's still in the category of highly difficult exams, testing a combination of fundamental ML knowledge and Google Cloud's ML service use cases.
- It is undoubtedly the most difficult among the Professional exams.
- I barely finished the first pass in about 110 minutes and could only review a few questions.
- I only skipped about one question, so I feel lucky to have passed with my first-pass answers.
- My feeling is that my score was barely 70%.

Topic Trends:

Fundamental Knowledge:
- Multiple questions on Interpretability-Explainability-AI
- Integrated gradients
- Shapley Explanation
- Many complex questions testing the understanding of when to use regression vs. classification problems.
- Many questions on tool selection for image recognition / anomaly detection.
- Many questions about confusion matrices.
- Matrix Factorization (for recommendations) was mentioned multiple times in incorrect answer choices.

Service Knowledge:
- BQML in general
- Vertex AI in general
- Model Garden
- Vertex AI Workbench
- Vertex AI Vizier
- TensorFlow Lite (multiple questions)
- Questions on selecting TensorFlow AI accelerators (TPU/GPU/CPU).
- Questions on technology selection/design for data pipelines / CI/CD.
- Questions about migrating/porting from Scikit-learn to Google Cloud.
- Questions about Request-Response Logging / adjusting the sampling rate.

Note:
- I don't recall many questions about reinforcement learning.

TODOs for Exam Day (Successful cases from PSE / PNE / PDE / PCA exams) ⭐️

The Day Before

  • Get a good night's sleep
    • Set up eye mask, earplugs, pillow

On the Day

  • Wake up by 9 AM (important to be well-rested)
  • Print the exam confirmation email
    • Forward the email to an app
    • Print at a convenience store
  • Do a final review at a cafe
    • Feeling like Doutor near Sapporo Station?
  • Take a 10-minute nap before arriving at the venue to fully refresh my brain
    • Get enough sugar too
  • Arrive at the venue 30 minutes before the test and complete check-in
    • Be conscious of reading the answer choices first
    • Be mindful of saving time for review
    • Have the courage to skip questions if I can't understand the English or if they are too difficult

🔥Study Strategy🔥

Learning Strategy:
  • Reference Materials
  • General Learning Materials
  • Old Practice Questions | Spent time reviewing incorrect questions (from 2025/04/07)
    • Google Cloud Certified Professional Machine Learning Engineer Practice Tests
    • 1st round:
      • Free test | 85% | 2025/03/12
      • Practice Test I | 40% | 2025/03/10
      • Practice Test 2 | 40% | 2025/03/12
      • SectionTest | ~50% | 2025/03/17
    • Review:
      • Free test
      • Practice Test I | 2025/03/11
      • Practice Test 2
      • SectionTest | Reviewed by re-solving questions | Averaging 70-80%
    • 2nd round:
      • Free test | 2025/03/21 | 95%
      • Practice Test I | 2025/03/21 | 65%
      • Practice Test 2 | 2025/03/21 | 75%
      • SectionTest |
    • 3rd round: -Study after failing-
      • Free test | 2025/04/05 | 90%
      • Practice Test I | 2025/04/04 | 76% ⤴︎
      • Practice Test 2 | 2025/04/04 | 73% | ⤵️
      • SectionTest | 2025/04/03 | Average over 80%
    • Review weak topics: Organize as needed
      • 2025/03/17
      • 2025/03/22
    • Practice Exam
  • New Practice Questions (from 2025/04/07)
    • Google GCP ML Engineer Certification Practice Updated Exam
      • Strong latest trends, started backwards from No. 5
    • 1st round: The test order got messed up, so this is approximate
      • Practice Test I | 2025/04/15 | 50%
      • Practice Test 2 | | 40%
      • Practice Test 3 | | 30%
      • Practice Test 4 | 2025/04/09 | 37%........
      • Practice Test 5 | 2025/04/08 | 55% ...
    • 2nd round:
      • Practice Test I | |
      • Practice Test 2 | 2025/04/16 | 30%...
      • Practice Test 3 | |
      • Practice Test 4 | |
      • Practice Test 5 | |
    • Review:
      • Practice Test I | 2025/04/15 | Using PerplexityAI is convenient
      • Practice Test 2 | 2025/04/16 | Considered the 2nd round as review
      • Practice Test 3 | 2025/04/17 | Review is going well
      • Practice Test 4 | 2025/04/17 | Review is going well
      • Practice Test 5 | 2025/04/17 | Going well, I guess. Lots of incorrect questions, which is a pain.

Weak Areas

Official References:

Useful Sites:

ML Pipeline Architecture Diagrams


Important English Terms

Anomaly

"Anomaly" has different meanings in English and machine learning (ML), but it basically means "abnormality" or "exception".

Annotation

In English, it refers to a "note" or "explanation". In the context of machine learning, it is used to mean "labeling" or "additional information for data".

skew

English meaning: "slant" or "distortion". A state where something is not straight but diagonal. ML meaning: The distribution of data is biased. It's especially used when a normal distribution is skewed.

drift

English meaning: movement, trend, bias ML meaning: A phenomenon where the model or data distribution changes over time, causing a decline in prediction accuracy. Especially when the distribution of training data and actual data diverge.

Curation

English meaning: "Curation" mainly refers to the act of organizing, selecting, and managing information or items. Machine learning (ML) meaning: "Curation" refers to the process of selecting, organizing, and managing the quality of a training dataset.

Pickle

English meaning: "Pickle" is the process of preserving food (especially vegetables and fruits) in vinegar, salt, etc. Machine learning (ML) meaning: A "pickled model" is a trained model that has been serialized with Python's pickle library and saved for later reuse.

Heuristic

English meaning: A "heuristic" is a method of quickly obtaining a solution based on intuition or rules of thumb when solving problems or making decisions. Machine learning (ML) meaning: A "heuristic" is an approximate method or strategy for finding an optimal solution, which derives a solution efficiently while saving computational resources.

Oscillations

English meaning: "vibration" or "swaying". A phenomenon where an object moves back and forth repeatedly around a central point. ML meaning: A phenomenon in a model's prediction or learning process where errors or values do not stabilize but repeatedly fluctuate up and down. It's often associated with a learning rate that is too high or with overfitting. Countermeasure: Lowering the learning rate can suppress prediction fluctuations.

Holdout data Holdout data - Machine Learning Glossary

English meaning (Holdout): a person who refuses to cooperate or compromise; an act of resistance or refusal. ML meaning: Samples that are intentionally not used ("held out") during training. Example: Validation dataset and test dataset (data intentionally excluded from training).

model-interpretability-Explainability-AI
  • Attention Mechanisms Visualization: Improves interpretability, especially in Natural Language Processing (NLP) and image recognition, by showing which parts of the input the model is focusing on. Identifying the basis for decisions by visualizing the weighting of words or image regions in machine translation or image captioning.
  • SHAP (Shapley Additive Explanations): Quantifies the impact of each feature on a prediction based on game theory, providing an explanation for the prediction. Quantifying the contribution of each feature to an individual prediction (Shapley values) and explaining black-box models.
  • Linear Regression: Since the features and predicted value have a linear relationship, it's easy to interpret the basis of the prediction from the feature coefficients. Interpreting the direct impact on the result using feature coefficients and predictive modeling.
  • Decision Tree: The branching conditions are explicit, and the prediction process can be traced in a tree structure, making it easy to understand the decision-making process. Classification/regression based on an explicit tree structure with branching rules, and visualization of the decision-making process.
  • Logistic Regression: In binary classification, the impact of feature weights on class prediction is interpretable. Evaluating the influence of each feature on class prediction using odds ratios in binary classification problems.
  • k-Nearest Neighbors (k-NN): Since predictions for new data are based on the nearest training data, it can show specific examples that contributed to the prediction. Classification/regression based on neighboring data points, and presenting similar instances as the basis for prediction.
  • Generalized Additive Model (GAM): Allows for independent evaluation of the impact of each feature on the prediction, making interpretation easy even for complex relationships. Interpreting the non-linear effects of each feature using smoothing functions, etc., and predictive modeling assuming additivity.
  • Random Forest: A model that combines multiple decision trees, it helps understand predictions by evaluating feature importance. High-precision ensemble prediction and identification of prediction factors by feature importance (e.g., mean impurity decrease).
  • Grad-CAM (Gradient-weighted Class Activation Mapping): Visualizes the image regions a CNN focuses on when predicting a specific class with a heatmap, showing the basis for the image recognition model's decision. Visualizing the basis for a decision in CNN image classification by using gradient information to create a heatmap of the relevant region.
  • LIME (Local Interpretable Model-agnostic Explanations): Explains the decisions of a black-box model by approximating the predictions of a complex model with a locally interpretable model. Interpretation of individual predictions of a black-box model using a local surrogate model (e.g., a linear model).
  • Feature Visualization: Generates input patterns that activate each layer or neuron of a neural network, allowing for a visual understanding of the network's internal representations. Visual understanding of learned features by generating input patterns that maximally activate specific neurons or layers inside a neural network.
  • Saliency Maps: Visualizes the influence of each pixel in an input image on the classification result, identifying important visual features. Pixel-level importance visualization using the gradient of the output with respect to the input in image classification.
  • Grad-CAM for X-ray Images: In anomaly detection for X-ray images, it visualizes the anatomical regions the model is focusing on, allowing doctors to confirm the basis for the diagnosis. Assisting doctors' decisions in medical imaging AI by using Grad-CAM to visualize the specific anatomical regions that form the basis of a diagnosis.
  • Uncertainty Estimation: Supports reliable decision-making by quantifying and visualizing the uncertainty of predictions, thereby evaluating the model's confidence. Evaluating model prediction confidence using methods like Bayesian estimation, ensembles, or Monte Carlo dropout, and supporting decision-making by identifying high-risk predictions.
  • Counterfactual Explanations: Provides practical insights by showing "how the input should be changed to alter this prediction result." Exploring minimal input perturbations to change a prediction result using optimization algorithms, and suggesting concrete actions.

Related: Overview of BigQuery Explainable AI

Explainable AI (also known as XAI) helps you understand the results that your predictive ML models generate for classification and regression tasks by defining how each feature in a row of data contributed to the predicted result. 👀 Check the list table


Supervised Learning
  • support-vector machine, SVM A machine learning algorithm used for data classification and regression. It achieves high versatility by determining a boundary line (hyperplane) that separates two classes with the maximum margin.
  • Naive Bayes A probabilistic method primarily used for categorical classification.
  • KNN (K-nearest neighbor) Its common use is for class classification.
  • Logistic Regression Although it has "regression" in its name, its purpose is probabilistic class classification.
  • Feedforward Neural Network A neural network that sequentially propagates information from the input layer to the output layer. It performs calculations at each layer to ultimately generate an output. Mainly used for classification and regression problems. For example, it is used for tasks such as customer purchase prediction and stock price prediction models.
  • CNN (Convolutional Neural Network) Frequently used for classification tasks, including image recognition.
  • Ensemble A method that improves prediction accuracy by combining multiple classifiers (e.g., bagging, boosting, stacking).
  • XGBoost (Extreme Gradient Boosting) A gradient boosting method, often used for classification problems.
  • ROC-AUC An evaluation metric for classification models (Area Under the ROC Curve).
  • attention based models Frequently used in text and image classification tasks.
  • [[Confusion Matrix]]_See separate sheet 🔥
Unsupervised Learning
  • K-means (Clustering) A method that automatically classifies given data into K clusters. It performs clustering by calculating the center point (centroid) of each cluster and assigning data to the nearest center point. Mainly used in unsupervised learning.
  • PCA (Principal Component Analysis) A method for reducing the dimensionality of multidimensional data while preserving its characteristics. It helps in data visualization and processing efficiency by finding a new basis that maximizes the data's variance and extracting the most important features.
  • N-gram A method that treats a sequence of N words or characters as a single unit. It is often used in natural language processing to analyze the context of text and is utilized in tasks like text generation and sentiment analysis.
  • Feature Clipping (Gradient Clipping) A method that limits excessive values when features or gradients exceed a certain threshold. This improves training stability and can prevent problems such as vanishing or exploding gradients.
  • Log Scaling Normalization A method that adjusts the scale by applying a logarithmic transformation, especially to data with a right-skewed distribution. It helps to streamline model training by treating a wide range of values more uniformly.
  • Feature Crosses A method that creates new features by multiplying multiple features together. It can improve prediction accuracy by incorporating non-linear relationships into the model.
  • SMOTE (Synthetic Minority Over-sampling Technique) | Oversampling method If there is little fraudulent usage data, SMOTE can be used to increase fraudulent samples, enabling the model to accurately detect fraud. Used in binary and multi-class classification (e.g., credit card fraud detection).
Reinforcement Learning
  • Q learning

    Q(s,a)(1α)Q(s,a)+α(R(s,a)+γmaxaQ(s,a))Q(s, a) \leftarrow (1 - \alpha) Q(s, a) + \alpha \left( R(s, a) + \gamma \max_{a'} Q(s', a') \right)

    This formula is the Q-learning update rule, representing the process of "learning from experience." The left side is the updated Q-value, and the right side is a weighted average of the "part that retains current knowledge" and the "part that incorporates new information." α\alpha is the learning rate (0 to 1), where a larger value gives more weight to new experiences. RR is the immediate reward, γ\gamma is the importance of future rewards (discount factor), and maxaQ(s,a)\max_{a'} Q(s', a') is the value of the optimal action in the next state. By repeating this update, the agent learns an optimal action strategy.

  • SARSA (State-Action-Reward-State-Action): *An on-policy reinforcement learning algorithm. It updates the value function based on the chosen action, allowing for safer exploration. Unlike Q-learning, a key feature is that it updates using the result of the action actually taken.

    Q(st,at)Q(st,at)+α(rt+1+γQ(st+1,at+1)Q(st,at))Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha(r_{t+1} + \gamma Q(s_{t+1},a_{t+1}) - Q(s_t,a_t))

    In the SARSA algorithm, when updating the action-value function Q(s_t, a_t), the value for the current state s_t and action a_t is adjusted based on the next state s_{t+1} and its action a_{t+1}. The update formula reflects the difference between the current value and the value in the next state, adjusting the influence of the next state using the learning rate α and discount rate γ. This allows the agent to improve its value function each time it receives a reward.

  • Monte Carlo Methods: A method that updates the action-value function based on the outcome of a complete episode. Learning proceeds by averaging the rewards over an entire episode and evaluating each action. Monte Carlo methods are a general term for methods that use random numbers to perform simulations or numerical calculations. It was originally devised by Stanisław Ulam to explore the movement of neutrons through matter and named by John von Neumann.

  • Deep Reinforcement Learning: A method that can directly process high-dimensional data by combining deep learning and reinforcement learning. It enables learning in complex environments and is gaining attention in fields like autonomous driving and gaming.

  • Actor-Critic method: An important method in reinforcement learning that takes an approach of learning the policy and value function simultaneously. It learns both a policy (Actor) and a value function (Critic) at the same time. By performing policy improvement and value evaluation in parallel, it improves the stability and efficiency of learning. It is a family of reinforcement learning (RL) algorithms that combines policy-based RL algorithms such as policy gradient methods with value-based RL algorithms such as value iteration, Q-learning, SARSA, and TD learning.

  • DQN (Deep Q-Network): A method that fuses Q-learning with deep learning. It estimates Q-values directly from high-dimensional input data and has achieved superhuman play in games like Atari.

  • DNN training The process of training a model using a multi-layered neural network. It improves the model's prediction accuracy by propagating data forward, backpropagating the error, and optimizing the weights.
  • Neural Networks A computational model inspired by biological neural circuits, which transmits information from an input layer to an output layer and processes it in hidden layers. It has the ability to approximate complex functions and is widely used for image recognition, speech recognition, and more.
  • RNN (Recurrent Neural Network) A neural network specialized for processing time-series data or data with a sequence. The output of the hidden layer is also used as input for the next time step, enabling it to learn temporal dependencies.
  • LSTM (Long Short Term Memory) (a type of RNN) A type of RNN with the ability to learn long-term dependencies. It uses a gating mechanism to select and retain important information, allowing it to hold onto information over long periods, which is difficult for standard RNNs.

Activation Functions:

  • Sigmoid in Machine Learning

    f(x)=11+exf(x) = \frac{1}{1 + e^{-x}}
    • An S-shaped activation function that transforms an input value xx to a range between 0 and 1. In neural networks, it is used when the output is interpreted as a probability (especially in classification tasks). It originates from the probabilistic interpretation used in logistic regression.
    • Used in binary classification problems (e.g., logistic regression, spam filtering).
  • ReLU (Rectified Linear Unit)

    f(x)=max(0,x)f(x) = \max(0, x)
    • ReLU is a neural network activation function that outputs the input if it is greater than 0, and 0 otherwise. Sigmoid and Tanh functions suffered from the vanishing gradient problem. ReLU was introduced to solve this problem.
    • Widely used in models like CNNs and deep learning (strong in hidden layers), contributing to faster learning and mitigating the vanishing gradient problem.
  • Tanh (Hyperbolic Tangent)

    f(x)=tanh(x)f(x) = \tanh(x)
    • A function that transforms input values to the range of -1 to 1. It is similar to the sigmoid function, but its output converges to the range of -1 to 1 instead of 0 to 1. It offers a more balanced range (-1 to 1) while behaving similarly to the sigmoid function, and is often used in RNNs.
    • Used in RNNs (for processing time-series data), especially as an activation function for hidden layers.
  • Softmax

    f(x)i=exij=1Kexjf(\mathbf{x})_i = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}
    • The softmax function, or normalized exponential function, is a multi-dimensional extension of the sigmoid function. In multi-class classification problems, it is often used as the final activation function because it can convert the output of a neural network into a probability distribution.
    • Each output value converges between 0 and 1, and the sum of all outputs becomes 1.
  • NLP Transformers

    • An architecture specialized for Natural Language Processing (NLP) tasks. It captures context using a self-attention mechanism. As a powerful alternative to traditional RNNs and LSTMs, it excels in parallel computation and can handle long text processing.
    • Used in the latest NLP models such as machine translation, text summarization, question answering, BERT, and GPT series.
  • GAN (Generative Adversarial Network) | Unsupervised Learning

    • A GAN is a model that generates data by training two neural networks (a generator and a discriminator) adversarially. The generator creates new data, and the discriminator determines if it is real or fake. The goal is to minimize a loss function.
    • Used for image generation (DeepFakes, Art Generation), data augmentation, and image restoration/transformation (image denoising, style transfer, etc.).
  • Embeddings

    • A technique for converting data such as language, images, and audio into a numerical format that computers can easily understand. It represents language data numerically, allowing the model to learn the semantic relationships of vocabulary.
    • Used for word similarity calculation (Word2Vec, GloVe), text classification, and recommendation systems.
  • One-hot Encoding

    • A method for converting categorical data into numerical vectors. Each category is represented as a vector with only one element as 1, and all others as 0.
    • Used when inputting categorical features into a neural network. Also used to digitize labels in machine learning classification problems.
  • Original Encoding

    • A method of converting input data features into numbers in their original state. It digitizes data without losing its original meaning or value, performing only the minimum necessary transformation.
    • Used when inputting numerical data as is into a machine learning model, or when standardization or normalization is required.
  • word2vec | Natural Language Processing

    • Word2Vec is a natural language processing technique for representing the meaning of words as numerical vectors (embeddings).
    • In this method, a dense vector (coordinates in a multidimensional space) is assigned to each word based on the context in which it is used. This allows the semantic relationships in language to be captured numerically.
Model Evaluation
  • Cosine Similarity A method for calculating similarity by measuring the angle between two vectors in a vector space. It is used to evaluate the similarity between items in text data and recommendation systems.
  • Precision and Recall Important evaluation metrics for imbalanced data. Precision indicates accuracy, while Recall indicates coverage, and the balance between the two is evaluated.
  • F1 Score The harmonic mean of Precision and Recall. It is particularly effective for model evaluation on imbalanced datasets.
  • ROC Curve A curve plotting the true positive rate against the false positive rate. It is used to intuitively evaluate a model's performance.
  • AUC (Area Under the Curve) The area under the ROC curve. The higher the AUC, the better the model's ability to correctly identify anomalies.

Loss Functions:

  • Mean Absolute Error (MAE) An evaluation metric for regression models, showing the average of the absolute errors between predicted and actual values. It is easy to intuitively understand the model's accuracy.
  • Mean Squared Error (MSE) An error evaluation metric for regression models. It emphasizes the magnitude of errors by squaring them and taking the average.
  • Logarithmic Loss (Log Loss) A metric for measuring the performance of classification models with probabilistic outputs. It takes a lower value the closer the probability is to the correct class.
  • Hinge Loss A loss function for classification problems, primarily used in Support Vector Machines (SVM). It is optimized to maximize the margin between classes.
  • Lift and Gain Charts Tools for evaluating model performance. They are used, especially in marketing and sales forecasting, to confirm the effectiveness of target extraction. They are used to visualize how effectively a model predicts the target class and how much it has improved compared to random prediction.
  • Calibration Curve A method for evaluating how reliable a model's probabilistic output is. It shows whether the predicted probabilities match the actual occurrence probabilities.

Model Interpretation/Explanation

model interpretability|Explainability AI

  • Cross-validation A technique for confirming a model's generalizability. It prevents overfitting and provides a stable evaluation of model performance by dividing the data into multiple subsets and repeating the training/validation process.
  • nested cross-validation is suitable when optimizing hyperparameters to prevent data leakage and to perform a more reliable performance evaluation. It involves a double cross-validation loop (inner/outer).
  • What-If Tool Tests a model's performance against various inputs, analyzes the importance of data features, and visualizes the model's behavior across multiple models and datasets. An open-source tool from Google.
  • Learning Interpretability Tool (LIT) A tool for visually interpreting a model's predictions and learning process. It makes feature importance and model behavior explainable.
  • Integrated gradients | Google Cloud A method that quantifies the contribution of each feature to a model's prediction. It evaluates feature importance by linearly varying the input features between a baseline (usually zero) and the input, and integrating the gradients along this path. Compared to alternative approaches that allow for extension to large networks and feature spaces like images, it has become a popular interpretability technique due to its applicability to any differentiable model (images, text, structured data), ease of implementation, theoretical justification, and computational efficiency. It has diverse use cases, such as understanding feature importance, identifying data skew, and debugging model performance.
  • Shapley Explanation | Google Cloud *A method that quantitatively shows how much each feature contributes to a model's prediction. Based on game theory, it provides a fair distribution of the prediction. The Sampled Shapley method works well with models that use meta ensemble learning with trees and neural networks.
  • XRAI in Machine Learning | Google Cloud A technology that provides interpretation for the prediction results of deep learning models. It visually explains which parts influenced the prediction, especially for image and text data.
Other Technologies
MethodRegularization TypeFeature SelectionCoefficient BehaviorMain AdvantageMain Disadvantage
Lasso RegressionL1PossibleShrinks to zeroAutomatically performs feature selection. Only important features remain.May not select strongly correlated features.
Ridge RegressionL2Not possibleBecomes smallerShrinks feature values. Uses all features.Does not perform feature selection; uses all features.
Elastic NetL1 + L2PossibleShrinks to zero/Becomes smallerBest of both Lasso and Ridge. Handles strongly correlated features.Hyperparameter tuning can be difficult.
  • Federated Learning A distributed machine learning method. It protects privacy by training models locally on each device without centralizing data, and then aggregating the results.
  • Collaborative Filtering using Matrix Factorization A method for predicting unknown items based on user-item interactions. It uses matrix factorization to extract latent features of users and items, which are then used in recommendation systems.
  • tf.distribute.Strategy (TensorFlow Distributed Strategy) An API for performing distributed training in TensorFlow. It is a method for efficiently training large datasets using multiple GPUs, TPUs, or a distributed environment.
  • Maximum Likelihood *An estimation method in statistics. It estimates the parameters for which the observed data has the highest probability of occurring. It is widely used for parameter estimation in probability models. It is widely used to explain the population that produced the observed data. L(θ)=P(X={x1,x2,,xn}θ)=i=1nf(xi;θ)L(\theta) = P(X = \{x_1, x_2, \ldots, x_n\} \mid \theta) = \prod_{i=1}^{n} f(x_i; \theta) The observed data XX is the set of actually observed data. The probability distribution f(x;θ)f(x; \theta) is the assumed probability distribution that the data follows, characterized by the parameter θ\theta. The likelihood function L(θ)L(\theta) is a function that indicates the probability of observing the given data XX under the parameter θ\theta. The product symbol i=1n\prod_{i=1}^{n} indicates that the probabilities for each data point are multiplied together. The likelihood function is used to estimate the parameter θ\theta for which the observed data is most "likely" to be obtained.

Parametric and Nonparametric | Classification of ML models

Parametric Machine Learning Algorithms

  • Features: Algorithms that assume a form for the function to be learned in advance and learn parameters based on that assumption. The number of parameters is fixed and does not change with the amount of data.
  • Learning: Fast to learn with little data. Flexibility: Constrained by the assumed function form, not suitable for complex problems.
  • Representative examples:
    • Linear Regression
    • Logistic Regression
    • Naive Bayes
    • Simple Neural Networks

Nonparametric Machine Learning Algorithms

  • Features: Algorithms that do not assume a function form but flexibly learn the function from the data. The number of parameters depends on the amount of data, and more complex functions can be learned with more data.
  • Learning: Requires large amounts of data and computational resources, learning speed is slow. Flexibility: Can adapt to complex function forms.
  • Representative examples:
    • k-Nearest Neighbors
    • Decision Trees
    • Support Vector Machines (SVM)

Lazy learning | Classification of models

Lazy learning is a method where the model does not learn from the training data in advance but makes inferences by using the data each time a prediction is made. A characteristic feature is that it takes little time to build the model, as it only involves storing and referencing the training data. Therefore, very little processing is done during the learning phase, and the computational load may increase at the time of prediction. Representative examples:

  • k-Nearest Neighbors (k-NN)
  • Local Regression
  • Lazy Naive Bayes
  • Lazy Decision Trees

Overview of Vertex AI | Start here first

  • With AutoML, you can train tabular, image, text, or video data without writing code or preparing data splits. You can deploy these models for online prediction or query them directly for batch prediction.
  • With custom training, you have full control over the training process, including using any ML framework, writing your own training code, and choosing hyperparameter tuning options. You can import your custom-trained model to the Model Registry and deploy it to an endpoint for online prediction using a pre-built or custom container. Or you can query it directly for batch prediction.
  • With Model Garden, you can explore, test, customize, and deploy Vertex AI and select open-source models and assets....

Model Garden | Vertex AI | Must-read model list 👀

Choose from world-class models from Google (Gemini), third parties (Meta's Llama 3.1), and open source (Anthropic's Claude Model Family) that meet our unique evaluation criteria. With over 160 curated models, each delivering best-in-class performance in their respective categories, customers can access high-performing foundation models that best suit their business needs.

Vertex Explainable AI | Explainable AI

Supported model types: Any TensorFlow model where you can provide the embeddings (latent representation) of an input is supported. Tree-based models like decision trees are not supported (as they are inherently explainable). Models from other frameworks, such as PyTorch or XGBoost, are not yet supported. For DNNs, it's often the case that the upper layers (closer to the output layer) are assumed to have learned "meaningful" things, so the second-to-last layer is often chosen for embeddings. You can test with a few different layers and examine the resulting examples to choose the layer based on quantitative (class match) or qualitative (looks reasonable) measures.

Overview of Vertex AI Experiments

You can track and evaluate your model's aggregate performance during training runs against a test dataset. This feature helps you understand your model's performance characteristics: how a particular model performs overall, where it doesn't perform well, and where your model excels.

Vertex AI Workbench

Vertex AI Workbench is a Jupyter notebook-based development environment for the entire data science workflow. You can interact with Vertex AI and other Google Cloud services from within a Jupyter notebook in your Vertex AI Workbench instance. Vertex AI Workbench integrations and features can help you access and explore your data, accelerate data processing, schedule notebook runs, and more.

  • Access and explore your data in a Jupyter notebook by using BigQuery and Cloud Storage integrations.
  • Automate recurring updates to your model by using scheduled notebook code execution on Vertex AI.
  • Process data quickly by running your notebook on a Dataproc cluster.

Vertex AI Model Monitoring

With Vertex AI Model Monitoring, you can run monitoring jobs on-demand or on a schedule to track the quality of your tabular models. If you have set up alerts, Vertex AI Model Monitoring sends you a notification when a metric surpasses a threshold you specify. For example, you might have a model that predicts a customer's lifetime value. As customer habits change, so do the factors that predict their spending. So the features and feature values that you previously used to train your model might no longer be relevant for your current predictions. This deviation in data is called drift.

Vertex AI Vizier

Vertex AI Vizier is a black-box optimization service that helps you tune hyperparameters in complex machine learning (ML) models. When an ML model has many different hyperparameters, it can be difficult and time-consuming to tune them manually. Vertex AI Vizier optimizes your model's output by tuning the hyperparameters for you. The default algorithm uses Bayesian optimization to more efficiently search the parameter space and derive the optimal solution. Black-box optimization is the optimization of a system that meets either of the following conditions:

  • It has no known objective function to evaluate.
  • It is too expensive to evaluate the objective function (often due to the system's complexity). Vizier (English: advisor)

Occupancy analytics | Vertex AI

The occupancy analytics model can be used to count the number of people or vehicles, based on specific inputs that you add to the video frames. Compared with the person and vehicle detection model, the occupancy analysis model offers advanced features. These features are active zone count, line cross count, and dwell detection.

Vertex AI Agent Builder

Vertex AI Agent Builder lets developers, even those with limited ML skills, tap into the power of Google's foundation models, search expertise (semantic search), and conversational AI technologies to create enterprise-grade generative AI applications.

TensorFlow Data Validation

It includes reviewing descriptive statistics, inferring a schema, checking for and fixing anomalies, and checking for drift and skew in datasets. It's important to understand your dataset's characteristics, including how it might change over time in your production pipeline. It's also important to look for anomalies in your data, and to compare your training, evaluation, and serving datasets to make sure that they're consistent.

TensorFlow I/O

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. It provides IO support (e.g., format conversion) for a number of systems and cloud vendors.

TensorFlow Model Analysis | TFMA

The goal of TensorFlow Model Analysis is to provide a mechanism for model evaluation in TFX. Using TensorFlow Model Analysis, you can perform model evaluations in your TFX pipeline and view the resulting metrics and plots in a Jupyter notebook. Specifically, it can provide the following:

TensorFlow Lite (TFLite)

LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. You can find ready-to-run LiteRT models for a wide range of ML/AI tasks, or you can use our AI Edge conversion and optimization tools to convert and run TensorFlow, PyTorch, and JAX models to the TFLite format. Multi-platform support: Supports Android devices, iOS devices, embedded Linux, and microcontrollers.

Get predictions from a forecast model | Vertex AI | predictions for a forecast model

To make a batch prediction request, you use the Google Cloud console or the Vertex AI API. Your input data source must be a CSV object stored in a Cloud Storage bucket or a BigQuery table. Predictions by AutoML (predictions for a forecast model) do not support endpoint deployment or online predictions. To request online predictions from a forecast model, use the tabular workflow for forecasting.

Use a custom container for prediction | Vertex AI | custom container for prediction

To customize how Vertex AI serves online predictions from your custom-trained model, you can specify a custom container instead of a pre-built container when you create a Model resource. With a custom container, Vertex AI runs an arbitrary Docker container of your choice on each prediction node.

Hello image data: Train an AutoML image classification model

Incremental training (incremental learning) usually results in faster training, reducing training time.

Manage BigQuery ML models in Vertex AI

By registering your BigQuery ML models with the Vertex AI Model Registry, you can manage your models alongside your Vertex AI models without having to export them. After you have your model in the Model Registry, you can use a single interface to version, evaluate, and deploy your models for online prediction without a serving container. You can register a model to the Model Registry using the MODEL_REGISTRY option in the CREATE MODEL statement.

Tips for reducing memory usage | TensorFlow - TPU Troubleshooting

  • Check for excessive tensor padding
    • Tensor padding is an operation performed to adjust the size of input data so that the model can process it properly. It is mainly used in convolutional layers and RNNs, with zero padding being the most common.
  • Use the bfloat16 format
  • If the input size (model) is too large, using TensorFlow's experimental model parallelism may help accommodate the model size.

Vertex AI Feature Store

Vertex AI Feature Store is a fully managed, native Feature Store service that's an integral part of Vertex AI. It streamlines your ML feature management and online serving processes. You can manage your feature data in BigQuery tables or views and serve online from BigQuery data sources. Vertex AI Feature Store provisions resources that let you set up online serving by specifying feature data sources. It then acts as a metadata layer that interacts with the BigQuery data sources to serve the latest feature values directly from BigQuery for online prediction with low latency.

Custom-Prediction-Routines | Vertex AI

With custom prediction routines (CPR), you can easily build a custom container with preprocessing and postprocessing code without having to configure an HTTP server or build a container from scratch. You might want to use preprocessing to normalize or transform your inputs, call an external service for additional data, or use postprocessing to format your model's prediction or run business logic.

  • You don't have to write a model server or a Dockerfile. A model server (an HTTP server that hosts your model) is provided for you.
  • You can deploy and debug your model locally, which enables a faster iteration cycle during development.

Using managed datasets | Vertex AI

Learn how to use Vertex AI managed datasets to train custom models. With managed datasets, you can:

  • Manage your datasets centrally.
  • Easily create labels and multiple annotation sets.
  • Create human-labeling tasks by using integrated data labeling.
  • Track the lineage of your models for governance and iterative development.
  • Use the same dataset to train both AutoML and custom models, and compare model performance.
  • Generate and visualize statistics for your data.
  • Automatically split your data into training, test, and validation sets.

Build your own retrieval-augmented generation | Vertex AI Agent Builder

The DAG diagram is helpful.

ExampleGen TFX | TensorFlow

The ExampleGen component, one of the TFX pipeline components, ingests data into the TFX pipeline. It reads data from external files or services and generates Examples that other TFX components read. It also splits the dataset in a consistent way. The split can be changed by configuration. At the same time, it shuffles the dataset according to machine learning best practices. Input: Data from external data sources such as CSV, TFRecord, and BigQuery Output: tf.Example records

AI accelerator | google cloud | Official

An AI accelerator is dedicated hardware or software aimed at speeding up AI processing. It mainly streamlines deep learning and machine learning training and inference.

  1. TPU (Tensor Processing Unit)
    • Dedicated hardware developed by Google to accelerate Deep Learning.
    • Models with a high proportion of matrix calculations.
    • TPUs are not recommended for workloads that require high-precision arithmetic and are recommended for models that train for weeks or months (from a practice exam).
    • Compared to GPUs, they offer significantly better performance and cost efficiency for large-scale training.
  2. GPU (Graphics Processing Unit)
    • Has high parallel computing capabilities, used for training AI models / NVIDIA is mainstream.
    • GPUs are hardware suitable for deep learning training involving high-precision training, and by distributing training across multiple instances, they provide maximum flexibility in fine-tuning accelerator selection to minimize execution time (from a practice exam).
  3. CPU
    • Rapid prototyping that requires maximum flexibility.
    • Simple models that do not take long to train.
    • Small-scale models with small actual batch sizes.
    • Models that contain many custom TensorFlow operations written in C++.

Status of Vertex AI Scheduler API | Vertex AI

ACTIVE PAUSED COMPLETED

ConditionalParameter | Vertex AI | Cost reduction

The ConditionalParameterSpec object lets you add a hyperparameter to a trial when the value of a parent hyperparameter matches a specified condition. For example, you can define a hyperparameter tuning job that finds the optimal model by using either linear regression or a deep neural network (DNN). To specify the training method in your tuning job, you define a categorical hyperparameter named training_method with LINEAR_REGRESSION and DNN. When training_method is LINEAR_REGRESSION, the tuning job must specify a learning rate hyperparameter. When training_method is DNN, the tuning job must specify a learning rate and the number of hidden layers.

Sidecar mode and ESP | Cloud Endpoints

  • Sidecar mode: A method where a proxy (Extensible Service Proxy (ESP)) that provides API management functions runs in the same container as each application instance.
  • ESP: A proxy that provides API request authentication, traffic management, monitoring, logging, etc. (Summarized by GPT).
API Management System (Sidecar Mode Configuration Example)

├── Application Service
│ ├── Application Logic (API)
│ └── Data Storage, External Services, etc.

├── ESP (API Management Proxy)
│ ├── Authentication Function
│ ├── Traffic Control
│ ├── Monitoring Function
│ ├── Logging Function
│ └── Rate Limiting

└── Other Components
├── Request Gateway (Accepts API requests)
└── API Gateway (Coordinates proxy and application service)

Preventing overfitting

A concern when training BigQuery ML models is overfitting. Overfitting is when a model matches the training data too closely, which results in poor performance on new data. BigQuery ML supports two methods for preventing overfitting: early stopping and regularization.

Tune hyperparameters | Overfitting

Dropout rate: Dropout layers are used for regularization in models. It defines the fraction of the input to drop as a countermeasure for overfitting. Recommended range: 0.2 to 0.5. Learning rate: The frequency with which the neural network weights change between iterations. A high learning rate can cause large fluctuations in weights and may prevent finding the optimal values. A low learning rate is fine, but it will require more iterations to converge. We recommend starting with 1e-4. If training is very slow, increase this value. If the model is not learning, try decreasing the learning rate.

Prediction log types | VertexAI | Online prediction logging

Container logging The prediction nodes log the stdout and stderr streams to Cloud Logging. These logs are necessary for debugging. On v1 service endpoints, container logging is enabled by default. You can disable it when you deploy a model. You can also disable or enable logging when you modify a deployed model. Access logging Information like timestamp and latency for each request is logged to Cloud Logging. On both v1 and v1beta1 service endpoints, access logging is disabled by default. You can enable access logging when you deploy a model to an endpoint. Request-response logging A sample of online prediction requests and responses are logged to a BigQuery table. To enable request-response logging, you create or patch your prediction endpoint.

Choosing between the Healthcare Natural Language API and AutoML Entity Extraction for Healthcare | Cloud Healthcare API

The Healthcare Natural Language API provides a pre-trained natural language model that extracts medical concepts and relationships from medical text. The Healthcare Natural Language API maps text to a predefined set of medical knowledge categories. AutoML Entity Extraction for Healthcare lets you create a custom entity extraction model trained with your own annotated medical text (custom labels) and your own categories.

Dataflow ML

Dataflow ML combines Dataflow with the Apache Beam RunInference API. The RunInference API lets you define the characteristics and properties of your model and then pass that configuration to the RunInference transform. This feature lets you run your model within your Dataflow pipeline without needing to know the implementation details of the model. You can choose the framework that works best for your data, such as TensorFlow or PyTorch.

RunInference API | Dataflow

The RunInference API lets you build pipelines that contain multiple models. Multi-model pipelines are useful for tasks like running A/B tests or building ensembles to solve business problems that require more than one ML model. When building pipelines with multiple models, you can use one of two patterns:

  • The A/B branch pattern: A portion of the input data is sent to one model, and the rest of the data is sent to a second model.
  • The sequence pattern: The input data passes through two models in sequence.

Scikit-learn | CPU-based by default -> GPU support now possible

  • Scikit-learn is primarily optimized for CPU-based computations and is suitable for small to medium-sized datasets. Performance may be limited for large datasets or when real-time responses are required. ↓
  • If you want to leverage GPUs, consider NVIDIA's cuML library, which has an API similar to scikit-learn, and it can be run in environments like a Deep Learning VM (DLVM). Transition:
  • Since 2023, there is a growing but limited set of scikit-learn estimators that can run on a GPU if the input data is provided as PyTorch or CuPy arrays and scikit-learn is configured to accept such inputs.

Request-Response Logging | samplingRate | Vertex AI

For dedicated and Private Service Connect endpoints, you can use request-response logging to log request-response payloads smaller than 10 MB for TensorFlow, PyTorch, sklearn, and XGBoost models.

  • samplingRate: The fraction of requests and responses to log. Set to a value greater than 0 and less than or equal to 1. For example, to log all requests, set this value to 1, and to log 10% of requests, set it to 0.1.
  • Lowering samplingRate reduces monitoring costs by considering fewer data points for monitoring, while still enabling quick drift detection.

March 30, 2025 - Failed Exam Log

For next time:

It's possible my choice of practice questions was wrong from the start, but for next time, I'll first review the existing old practice questions and perfect the practice exam. I'll also take this as a chance to deepen my understanding of fundamental ML topics.

## Exam Overview
- 50 questions / 120 minutes
- Barely completed the first pass in 105 minutes, only had time to review about 2 questions.

## Exam Impressions
- Difficult. And many English words I didn't understand, which made it harder.
- More than I expected, there were many complex questions about the implementation, training, and improvement of ML with Google Cloud services, rather than just fundamental ML understanding.
- I especially got the impression there were many questions about implementing ML pipelines.
- It's frustrating. Especially since I had gone through the practice questions twice, scored around 70% on the old questions, and had done video-based learning.
- I also feel that there weren't many similar questions from the practice sets.

## For Next Time
It's possible my choice of practice questions was wrong from the start, but for next time, I'll first review the existing old practice questions and perfect the practice exam. I'll also take this as a chance to deepen my understanding of fundamental ML topics.

## Topic Trends
- Many questions on ML pipelines / data processing.
- Questions about API selection for recent models like Gemini and Llama were also included.
- TensorFlow I/O
- A few questions on technology selection from pipeline to monitoring.
- Many questions on selecting targets for performance tuning (TPU/GPU, etc.).
- I remember a question on recommendation selection where only Matrix Factorization from "Collaborative Filtering using Matrix Factorization" was in an answer choice.
- The softmax function appeared many times.
- Many questions that tested understanding of classification vs. regression problems while selecting a model.
- Many questions on explainability.
- Questions on data confidentiality.
- The word "ExampleGen" appeared frequently.
- RunInference API also appeared frequently.
- The word "accelerator" was also frequent.
- The word "Metadata" also came up quite a bit, in questions about pipeline configuration.

Table from the failed exam report:

SectionApproximate % Scored QuestionsSection Performance
Section 1: Architecting low-code AI solutions13%Meets
Section 2: Collaborating within and across teams to manage data and models14%Does Not Meet
Section 3: Scaling prototypes into ML models18%Does Not Meet
Section 4: Serving and scaling models20%Borderline
Section 5: Automating and orchestrating ML pipelines22%Does Not Meet
Section 6: Monitoring AI solutions13%Does Not Meet

April 15, 2025: Submitted Improvement Request to Udemy Practice Question Author

  • April 16, 2025: Submitted the following request on Udemy
  • April 16, 2025: The author responded with thanks and offered a free coupon.
Subject: Improvement Request regarding Google Cloud Certification Professional Machine Learning Practice Exam