Skip to main content

Professional Machine Learning Engineer

🌸 Passed: April 18, 2025|☔️ Failed: March 30, 2025

Exam Information
  • Exam Name: Professional Machine Learning Engineer
  • Date: 18 April 2025
  • Time: 01:30 PM

Post-Exam Reflections_2025/04/19

Exam Information / Impressions
  • 50 questions, 120 minutes, in English.
  • The exam difficulty felt easier than the first time, perhaps because it was my second attempt. Still, it felt like a high-difficulty exam that combines foundational ML knowledge with Google Cloud's ML service use cases.
  • It is undoubtedly the most difficult among the Professional exams.
  • I finished the first pass in about 110 minutes, just barely, and could only review a few questions.
  • I only flagged about one question to review later, so I feel lucky to have passed based on my first-pass answers.
  • My feeling is that my score was just barely around 70%.
Question Trends

Fundamental Knowledge:

  • Multiple questions about Interpretability-Explainability-AI
    • Integrated gradients
    • Shapley Explanation
  • Many complex questions asking about the proper use of regression vs. classification.
  • Many questions about tool selection for image recognition / anomaly detection.
  • Many questions related to confusion matrices.
  • Multiple incorrect answer choices included descriptions of Matrix Factorization (for recommendations).

Service Knowledge:

  • BigQuery ML in general
  • Vertex AI in general
  • Model Garden
  • Vertex AI Workbench
  • Vertex AI Vizier
  • TensorFlow Lite (multiple questions)
  • Questions about selecting TensorFlow AI accelerators (TPU/GPU/CPU).
  • Questions about technology selection/design for data pipelines / CI/CD.
  • Questions about migrating/porting from Scikit-learn to Google Cloud.
  • Questions about adjusting Request-Response Logging / sampling rates.

Additional Notes:

  • I have the impression there weren't many questions about Reinforcement Learning.

TODOs for Exam Day (Successful practices from PSE / PNE / PDE / PCA exams) ⭐️

Day Before

  • Get a good night's sleep
    • Set up eye mask, earplugs, and pillow

On the Day

  • Wake up by 9 AM (important to be well-rested)
  • Print the exam confirmation email
    • Forward the email to the app
    • Print at a convenience store
  • Do a final review at a café
    • Feeling like Doutor near Sapporo Station?
      • Review incorrect answers from the official practice exam.
      • Review weak areas.
      • Go through the list of weak areas. Build a mental image.
      • Review incorrect answers from practice questions.
  • Take a 10-minute nap before arriving at the venue to fully refresh the brain.
    • Also get enough sugar.
  • Arrive at the venue at least 30 minutes before the test starts and complete check-in.
    • Be mindful of reading the options first.
    • Be mindful of saving time for review.
    • If I can't understand the English or if a question is too difficult, have the courage to skip it.

🔥 Strategy for the Exam 🔥

Learning Strategy:
  • Reference Materials
  • General Learning Materials
  • Old Practice Questions | Focus on reviewing incorrect answers (2025/04/07〜)
    • Google Cloud Certified Professional Machine Learning Engineer Practice Tests
    • 1st Round:
      • Free Test|85%|2025/03/12
      • Practice Test I|40%|2025/03/10
      • Practice Test 2|40%|2025/03/12
      • SectionTest|Around 50%|2025/03/17
    • Review:
      • Free Test
      • Practice Test I|2025/03/11
      • Practice Test 2
      • SectionTest|Review by re-solving problems|Scoring 70-80% on average
    • 2nd Round:
      • Free Test|2025/03/21|95%
      • Practice Test I|2025/03/21|65%
      • Practice Test 2|2025/03/21|75%
      • SectionTest|
    • 3rd Round: -Study after failing-
      • Free Test|2025/04/05|90%
      • Practice Test I|2025/04/04|76% ⤴︎
      • Practice Test 2|2025/04/04|73%|⤵️
      • SectionTest|2025/04/03|Avg Over 80%
    • Review of weak areas: Organize as needed
      • 2025/03/17
      • 2025/03/22
    • Practice Exam
  • New Practice Questions (2025/04/07〜)
    • Google GCP ML Engineer Certification Practice Updated Exam
      • Strongly reflects the latest trends, started backwards from No. 5.
    • 1st Round: The test order got mixed up, so this is approximate
      • Practice Test I|2025/04/15|50%
      • Practice Test 2||40%
      • Practice Test 3||30%
      • Practice Test 4|2025/04/09|37%........
      • Practice Test 5|2025/04/08|55% ...
    • 2nd Round:
      • Practice Test I||
      • Practice Test 2|2025/04/16|30%...
      • Practice Test 3||
      • Practice Test 4||
      • Practice Test 5||
    • Review:
      • Practice Test I|2025/04/15|Using Perplexity AI is convenient.
      • Practice Test 2|2025/04/16|Consider the 2nd round as review.
      • Practice Test 3|2025/04/17|Review is going well.
      • Practice Test 4|2025/04/17|Review is going well.
      • Practice Test 5|2025/04/17|Going well, I guess. Many incorrect questions, which is a pain.

Weak Areas

Official References:

Useful Sites:

ML Pipeline Design Diagrams


Important English Vocabulary

Anomaly

While 'Anomaly' has different meanings in English and Machine Learning (ML), it basically means 'abnormality' or 'exception'.

Annotation

In English, it means 'note' or 'explanation'; in the context of ML, it is used to mean 'labeling' or 'additional information for data'.

skew

English meaning: 'tilt' or 'distortion'. A state where something is not straight but slanted. ML meaning: The distribution of data is biased. Especially used when a normal distribution is distorted.

drift

English meaning: movement, trend, bias. ML meaning: A phenomenon where the model or data distribution changes over time, leading to a decline in prediction accuracy. Especially refers to when the distribution of training data and actual data diverge.

Curation

English meaning: 'Curation' mainly refers to the act of organizing, selecting, and managing information or items. ML meaning: 'Curation' refers to the process of selecting, organizing, and managing the quality of a training dataset.

Pickle

English meaning: 'Pickle' refers to preserving food (especially vegetables or fruits) in vinegar, salt, etc. ML meaning: A 'pickled model' is a trained model that has been serialized with Python's pickle library and saved for later reuse.

Heuristic

English meaning: 'Heuristic' is a method for quickly finding a solution when solving problems or making decisions, based on intuition or rules of thumb. ML meaning: 'Heuristic' is an approximate method or strategy for finding an optimal solution, guiding to a solution efficiently while saving computational resources.

Oscillations

English meaning: 'vibration' or 'swaying'. A phenomenon where an object moves back and forth repeatedly around a central point. ML meaning: A phenomenon in a model's prediction or learning process where the error or value does not stabilize and repeatedly fluctuates up and down. Often used in relation to a learning rate that is too high or overfitting. Solution: Lowering the learning rate can suppress prediction fluctuations.

Holdout data Holdout data - Machine Learning Glossary

English meaning (Holdout): a person who refuses to cooperate or compromise; an act of resistance or refusal. ML meaning: Samples that are intentionally not used during training ('held out'). Example: validation dataset and test dataset (data intentionally excluded from training).

model-interpretability-Explainability-AI
  • Attention Mechanisms: Improves interpretability, especially in NLP and image recognition, by showing which parts of the input the model is focusing on. Example: Visualizing weights on words or image regions in machine translation or image captioning to identify the basis for a decision.
  • SHAP (SHapley Additive exPlanations): Quantifies the impact of each feature on a prediction based on game theory, providing an explanation for the prediction. Example: Quantifying the contribution (Shapley value) of each feature to an individual prediction and explaining black-box models.
  • Linear Regression: Since features and the predicted value have a linear relationship, the basis for the prediction is easy to interpret from the feature coefficients. Example: Interpreting the direct impact on the result using feature coefficients and prediction modeling.
  • Decision Tree: Because the branching conditions are explicit and the prediction process can be traced through the tree structure, the decision-making process is easy to understand. Example: Classification/regression based on an explicit tree structure of branching rules, and visualization of the decision-making process.
  • Logistic Regression: In binary classification, the impact of feature weights on class prediction is interpretable. Example: Evaluating the impact of each feature on class prediction using odds ratios in binary classification problems.
  • k-Nearest Neighbors (k-NN): Since it bases predictions for new data on the nearest training data, it can show specific examples that contributed to the prediction. Example: Classification/regression based on neighboring data points, and presentation of similar instances as the basis for prediction.
  • Generalized Additive Models (GAM): Allows for independent evaluation of the effect of each feature on the prediction, making interpretation easy even for complex relationships. Example: Interpreting the non-linear effects of each feature using smoothing functions, etc., and predictive modeling assuming additivity.
  • Random Forest: A model that combines multiple decision trees, which helps in understanding predictions by evaluating feature importance. Example: High-accuracy ensemble prediction and identification of predictive factors through feature importance (e.g., mean impurity decrease).
  • Grad-CAM (Gradient-weighted Class Activation Mapping): Visualizes the regions of an image that a CNN is focusing on when predicting a specific class, showing the basis for the image recognition model's decision. Example: Visualizing the basis for a decision in CNN-based image classification by creating a heatmap of relevant regions using gradient information.
  • LIME (Local Interpretable Model-agnostic Explanations): Explains the decisions of black-box models by approximating the predictions of a complex model with a locally interpretable model. Example: Interpreting individual predictions of a black-box model using a local surrogate model (e.g., a linear model).
  • Feature Visualization: Generates input patterns that activate specific layers or neurons in a neural network, allowing for a visual understanding of the network's internal representations. Example: Visually understanding learned features by generating input patterns that maximally activate specific neurons or layers within a neural network.
  • Saliency Maps: Visualizes the influence of each pixel in an input image on the classification result, identifying important visual features. Example: Visualizing pixel-level importance in image classification using the gradient of the output with respect to the input.
  • Grad-CAM for X-ray Images: In anomaly detection for X-ray images, it visualizes the anatomical regions the model is focusing on, allowing doctors to confirm the basis for the diagnosis. Example: Supporting doctors' decisions in medical AI by visualizing specific anatomical regions as the basis for diagnosis using Grad-CAM.
  • Uncertainty Estimation: Supports reliable decision-making by quantifying and visualizing the uncertainty of predictions, thereby evaluating the model's confidence. Example: Assessing model prediction reliability and identifying high-risk predictions through methods like Bayesian estimation, ensemble methods, or Monte Carlo dropout to aid decision-making.
  • Counterfactual Explanations: Provides practical insights by showing "how the input should be changed to alter this prediction result." Example: Exploring minimal input perturbations to change a prediction outcome using optimization algorithms and suggesting specific actions.

Related: Overview of BigQuery Explainable AI

Explainable AI (also known as XAI) makes it easier to understand the results that predictive ML models generate for classification and regression tasks by defining how each feature in a row of data contributed to the predicted result. 👀 Check the list/table.


Supervised Learning
  • support-vector machine (SVM) Note: A machine learning algorithm used for data classification and regression. In particular, it is a method that achieves high generalization by determining the boundary line (hyperplane) that separates two classes with the maximum margin.
  • Naive Bayes Note: A probabilistic method mainly used for categorical classification.
  • KNN (K-nearest neighbor) Note: Class classification is its common use.
  • Logistic Regression Note: Although it has 'regression' in its name, its purpose is probabilistic class classification.
  • Feedforward Neural Network Note: A neural network that sequentially propagates information from the input layer to the output layer. It performs calculations layer by layer to ultimately generate an output. Note: Mainly used for classification and regression problems. For example, it is utilized in tasks such as customer purchase prediction and stock price prediction models.
  • CNN (Convolutional Neural Network) Note: Widely used for classification tasks, including image recognition.
  • Ensemble learning Note: A method that improves prediction accuracy by combining multiple classifiers (e.g., bagging, boosting, stacking).
  • XGBoost (Extreme Gradient Boosting) Note: A gradient boosting method, often used for classification problems.
  • ROC-AUC Note: An evaluation metric for classification models (Area Under the ROC Curve).
  • attention based models Note: Many use cases, especially in text and image classification tasks.
  • Confusion Matrix
Unsupervised Learning
  • K-means | Clustering Note: A method that automatically classifies given data into K clusters. It performs clustering by calculating the center point (centroid) of each cluster and assigning data to the nearest center point. Mainly used in unsupervised learning.
  • PCA (Principal Component Analysis) Note: A method for reducing the dimensionality of multi-dimensional data while retaining its features. It finds a new basis that maximizes the variance of the data and extracts the most important features, which is useful for data visualization and improving processing efficiency.
  • N-gram Note: A method that treats a contiguous sequence of N words or characters as a single unit. In natural language processing, it is often used to analyze the context of text and is utilized in tasks such as text generation and sentiment analysis.
  • Feature Clipping | Gradient Clipping Note: A method that limits excessive values when features or gradients exceed a certain threshold. This improves the stability of learning and can prevent problems such as vanishing or exploding gradients.
  • Log Scaling Normalization Note: A method that applies a logarithmic transformation to adjust the scale of data, especially for data with a right-skewed distribution. It helps to streamline model learning by treating a wide range of numbers more evenly.
  • Feature Crosses Note: A method that generates new features by multiplying multiple features together. By incorporating non-linear relationships into the model, it can improve prediction accuracy.
  • SMOTE (Synthetic Minority Over-sampling Technique) | Oversampling method Note: When there is little fraudulent data, SMOTE can be used to increase the number of fraudulent samples, enabling the model to detect fraud more accurately. Used in binary and multi-class classification (e.g., credit card fraud detection).

Reinforcement Learning
Q(s,a)(1α)Q(s,a)+α(R(s,a)+γmaxaQ(s,a))Q(s, a) \leftarrow (1 - \alpha) Q(s, a) + \alpha \left( R(s, a) + \gamma \max_{a'} Q(s', a') \right)

This is the Q-learning update equation, which represents the process of "learning from experience." The left side is the updated Q-value, and the right side is a weighted average of "the part that retains current knowledge" and "the part that incorporates new information." α\alpha is the learning rate (0 to 1); a larger value places more emphasis on new experiences. RR is the immediate reward, γ\gamma is the discount factor for future rewards, and Max(a) Q(s', a') represents the value of the optimal action in the next state. By repeating this update, the agent learns an optimal action strategy.

  • SARSA (State-Action-Reward-State-Action): Note: An on-policy reinforcement learning algorithm. It updates the value function based on the action it actually takes, allowing for safer exploration. Unlike Q-learning, its key feature is updating based on the result of the next action that is actually chosen.
Q(st,at)Q(st,at)+α(rt+1+γQ(st+1,at+1)Q(st,at)) Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha(r_{t+1} + \gamma Q(s_{t+1},a_{t+1}) - Q(s_t,a_t))

In the SARSA algorithm, when updating the action-value function Q(s_t, a_t), the value for the current state s_t and action a_t is adjusted based on the next state s_{t+1} and its corresponding action a_{t+1}. The update equation reflects the difference between the current value and the value in the next state, using the learning rate α and discount factor γ to adjust the influence of the next state. This allows the agent to improve its value function each time it receives a reward.

  • Monte Carlo Methods: Note: A method that updates the action-value function based on the outcome of a complete episode. Learning progresses by averaging the total rewards of entire episodes to evaluate each action. The term Monte Carlo method is a general term for techniques that use random numbers for simulation and numerical calculation. It was originally devised by Stanislaw Ulam to study the movement of neutrons in materials and was named by John von Neumann.

  • Deep Reinforcement Learning: Note: A method that combines deep learning and reinforcement learning, enabling direct processing of high-dimensional data. It facilitates learning in complex environments and is gaining attention in fields like autonomous driving and gaming.

  • Actor-Critic method: Note: An important method in reinforcement learning that learns a policy and a value function simultaneously. By performing policy improvement (Actor) and value evaluation (Critic) in parallel, it improves the stability and efficiency of learning. It's a family of reinforcement learning (RL) algorithms that combines policy-based RL algorithms like policy gradients with value-based RL algorithms like value iteration, Q-learning, SARSA, and TD learning.

  • DQN (Deep Q-Network): Note: A method that merges Q-learning with deep learning. It estimates Q-values directly from high-dimensional input data and has achieved superhuman performance in games like Atari.

  • DNN training Note: The process of training a model using a multi-layered neural network. Data is propagated forward to compute an error, which is then backpropagated to optimize the weights, thereby improving the model's prediction accuracy.
  • Neural Networks Note: A computational model inspired by the neural circuits of living organisms. It transmits information from an input layer to an output layer, with processing occurring in hidden layers. It has the ability to approximate complex functions and is widely used in image and speech recognition.
  • RNN (Recurrent Neural Network) Note: A neural network specialized for processing time-series data or data with sequential order. The output of a hidden layer is also used as an input for the next time step, allowing it to learn temporal dependencies.
  • LSTM (Long Short-Term Memory) (a type of RNN) Note: A type of RNN with the ability to learn long-term dependencies. To retain long-term information that is difficult for standard RNNs to learn, it uses a gate mechanism to select and hold important information.

Activation Functions:

f(x)=11+ex f(x) = \frac{1}{1 + e^{-x}}
  • An S-shaped activation function that transforms an input value xx into the range of 0 to 1. In neural networks, it is used when the output is interpreted as a probability (especially in classification tasks). It originates from the probabilistic interpretation used in logistic regression.

  • Used in binary classification problems (e.g., logistic regression, spam filtering).

  • ReLU (Rectified Linear Unit)

f(x)=max(0,x) f(x) = \max(0, x)
  • ReLU is a neural network activation function that outputs the input as is if it's greater than 0, and 0 otherwise. Sigmoid and Tanh functions suffered from the vanishing gradient problem; ReLU was introduced to solve this issue.

  • Widely used in models like CNNs and deep learning ( especially effective in hidden layers), contributing to faster learning and mitigating the vanishing gradient problem.

  • Tanh (Hyperbolic Tangent)

f(x)=tanh(x) f(x) = \tanh(x)
  • A function that transforms input values into the range of -1 to 1. It is similar to the sigmoid function, but its output converges to the -1 to 1 range instead of 0 to 1. While behaving similarly to sigmoid, it provides a more balanced range (-1 to 1) and is often used in RNNs.

  • Used in RNNs (for processing time-series data), especially as an activation function for hidden layers.

  • Softmax

f(x)i=exij=1Kexj f(\mathbf{x})_i = \frac{e^{x_i}}{\sum_{j=1}^{K} e^{x_j}}
  • The Softmax function, or normalized exponential function, is an extension of the sigmoid function to multiple dimensions. In multi-class classification problems, it is often used as the final activation function because it can convert the neural network's output into a probability distribution.

  • Each output value converges to between 0 and 1, and the sum of all outputs is 1.

  • NLP Transformers

    • An architecture specialized for Natural Language Processing (NLP) tasks. It captures context using a Self-Attention mechanism. As a powerful alternative to traditional RNNs and LSTMs, it excels at parallel computation and can handle long sequences.
    • Used in the latest NLP models, such as machine translation, text summarization, question answering, BERT, and GPT series.
  • GAN (Generative Adversarial Network) | Unsupervised Learning

    • GAN is a model that generates data by training two neural networks (a Generator and a Discriminator) adversarially. The Generator creates new data, and the Discriminator determines if it is real or fake. The goal is to minimize a loss function.
    • Applications include image generation (DeepFakes, Art Generation), data augmentation, and image restoration/transformation (e.g., image denoising, style transfer).
  • Embeddings (What is Word Embedding?)

    • A technique for converting language, image, or audio data into a numerical format that is easier for computers to understand. It represents language data numerically, allowing the model to learn the semantic relationships between words.
    • Applications include word similarity calculation (Word2Vec, GloVe), text classification, and recommendation systems.
  • One-hot Encoding

    • A method for converting categorical data into numerical vectors. For each category, it creates a vector where only one element is 1, and all others are 0.
    • Used when inputting categorical features into a neural network. Also used to digitize labels in machine learning classification problems.
  • Original Encoding

    • A method of converting features of input data into numerical values while preserving their original state. It digitizes data without losing its original meaning or value, applying only the necessary minimal transformations.
    • Used when inputting numerical data as-is into a machine learning model, or when standardization or normalization is required.
  • word2vec | Natural Language Processing

    • Word2Vec is a natural language processing technique for representing the meaning of words as numerical vectors (embeddings).
    • In this method, each word is assigned a dense vector (coordinates in a multi-dimensional space) based on the context in which it is used. This allows the semantic relationships of language to be captured numerically.

Model Evaluation
  • Cosine Similarity Note: A method that measures the angle between two vectors in a vector space to calculate their similarity. It is used in text data analysis and recommendation systems to evaluate the similarity between items.
  • Precision and Recall Note: Important evaluation metrics for imbalanced data. Precision indicates accuracy, while Recall indicates coverage, and the balance between the two is evaluated.
  • F1 Score Note: The harmonic mean of Precision and Recall. It is particularly effective for model evaluation on imbalanced datasets.
  • ROC Curve Note: A curve that plots the true positive rate against the false positive rate. It is used to intuitively evaluate a model's performance.
  • AUC (Area Under the Curve) Note: The area under the ROC curve. A higher AUC indicates that the model has a better ability to correctly identify anomalies.

Loss Functions:

  • Mean Absolute Error (MAE) Note: An evaluation metric for regression models that shows the average of the absolute errors between predicted and actual values. It makes the model's accuracy easy to understand intuitively.
  • Mean Squared Error (MSE) Note: An error evaluation metric for regression models. It squares the errors and then takes the average, emphasizing the magnitude of larger errors.
  • Logarithmic Loss (Log Loss) Note: A metric for measuring the performance of classification models with probabilistic outputs. It takes a lower value the closer the predicted probability is to the correct class.
  • Hinge Loss Note: A loss function for classification problems, primarily used in Support Vector Machines (SVM). It is optimized to maximize the margin between classes.
  • Lift and Gain Charts Note: Tools for evaluating a model's performance, especially used in marketing and sales predictions to confirm the effectiveness of target selection. They are used to visualize how effectively a model predicts a target class and how much improvement it offers compared to a random prediction.
  • Calibration Curve Note: A method for evaluating how reliable a model's probabilistic outputs are. It shows whether the predicted probabilities match the actual frequencies of occurrence.

Model Interpretation / Explanation

model interpretability|Explainability AI

  • Cross-validation Note: A technique to confirm a model's generalization ability. It prevents overfitting and provides a stable evaluation of model performance by repeatedly splitting the data into multiple subsets for training and validation.
  • nested cross-validation is suitable for hyperparameter optimization when you want to prevent data leakage and obtain a more reliable performance evaluation. It involves a double cross-validation loop (inner/outer).
  • What-If Tool Note: An open-source tool from Google that tests a model's performance against various inputs, analyzes the importance of data features, and visualizes model behavior across multiple models or datasets.
  • Learning Interpretability Tool (LIT) Note: A tool for visually interpreting a model's predictions and learning process. It helps explain feature importance and model behavior.
  • Integrated gradients | (Google Cloud) Note: A method that quantifies the contribution of each feature to a model's prediction. It evaluates feature importance by linearly interpolating between the input feature and a baseline (usually zero) and integrating the gradients along this path. Compared to alternative approaches that allow for extension to large networks and feature spaces like images, it has become a popular interpretability technique due to its wide applicability to any differentiable model (e.g., images, text, structured data), ease of implementation, theoretical justification, and computational efficiency. It has diverse use cases, including understanding feature importance, identifying data skew, and debugging model performance.
  • Shapley Explanation | (Google Cloud) Note: A method that quantitatively shows how much each feature contributes to a model's prediction. It is based on game theory and provides a fair distribution of the prediction's credit. The Sampled Shapley method works well for models that use meta-ensemble learning with trees and neural networks.
  • XRAI in Machine Learning | (Google Cloud) Note: A technology that provides interpretations for the prediction results of deep learning models. It visually explains which parts of the data, especially for images and text, influenced the prediction.
Other Technologies
MethodRegularization TypeFeature SelectionCoefficient BehaviorMain AdvantagesMain Disadvantages
Lasso RegressionL1YesShrinks to zeroAutomatically performs feature selection.May not select strongly correlated features.
Ridge RegressionL2NoBecomes smallerShrinks coefficients. Uses all features.Does not perform feature selection.
Elastic NetL1 + L2YesShrinks/to zeroCombines benefits of Lasso and Ridge.Hyperparameter tuning can be difficult.
  • Federated Learning Note: A decentralized machine learning approach. It protects privacy by training models locally on each device without centralizing data and then aggregating the results.
  • Collaborative Filtering using Matrix Factorization Note: A method for predicting unknown items based on user-item interactions. It uses matrix factorization to extract latent features of users and items, which is then utilized in recommendation systems.
  • tf.distribute.Strategy Note: An API in TensorFlow for distributed training. It's a method for efficiently training on large datasets using multiple GPUs, TPUs, or a distributed environment.
  • Maximum Likelihood Note: An estimation method in statistics. It estimates the parameters for which the observed data is most likely to occur. It is widely used for parameter estimation in probability models and for describing the population from which the observed data was generated.
L(θ)=P(X={x1,x2,,xn}θ)=i=1nf(xi;θ) L(\theta) = P(X = \{x_1, x_2, \ldots, x_n\} \mid \theta) = \prod_{i=1}^{n} f(x_i; \theta)

The observed data XX is the set of actually observed data points. The probability distribution f(x;θ)f(x; \theta) is the assumed probability distribution that the data follows, characterized by the parameter θ\theta. The likelihood function L(θ)L(\theta) is a function that indicates the probability of observing the given data XX under the parameter θ\theta. The product symbol i=1n\prod_{i=1}^{n} indicates that the probabilities for each data point are multiplied together. The likelihood function is used to estimate the parameter θ\theta for which the observed data is most "likely" to have been obtained.

Parametric and Nonparametric | Classification of ML Models

Parametric Machine Learning Algorithms

  • Features: Algorithms that assume a form for the function to be learned in advance and learn parameters based on that assumption. The number of parameters is fixed and does not change with the amount of data.
  • Learning: Learns quickly with little data.
  • Flexibility: Constrained by the assumed functional form, making it less suitable for complex problems.
  • Examples:
    • Linear Regression
    • Logistic Regression
    • Naive Bayes
    • Simple Neural Networks

Nonparametric Machine Learning Algorithms

  • Features: Algorithms that do not assume a functional form and learn the function flexibly from the data. The number of parameters depends on the amount of data, allowing for more complex functions to be learned with more data.
  • Learning: Requires a large amount of data and computational resources; learning speed is slow.
  • Flexibility: Can adapt to complex functional forms.
  • Examples:
    • k-Nearest Neighbors (k-NN)
    • Decision Trees
    • Support Vector Machines (SVM)

Lazy learning | Classification of Models

Lazy learning is a method where the model does not learn from the training data in advance but rather uses the data at the time of prediction to make inferences. A key characteristic is that model construction takes very little time, as it only involves storing and referencing the training data. Therefore, almost no processing occurs during the learning phase, but the computational load can increase at prediction time. Examples:

  • k-Nearest Neighbors (k-NN)
  • Local Regression
  • Lazy Naive Bayes
  • Lazy Decision Trees

Overview of Vertex AI | Start here first

  • With AutoML, you can train tabular, image, text, and video data without writing code or preparing data splits. You can deploy these models for online prediction or query them directly for batch prediction.
  • With custom training, you have full control over the training process, including using any ML framework, writing your own training code, and choosing hyperparameter tuning options. You can import your custom-trained model into Model Registry and deploy it to an endpoint for online prediction using a pre-built or custom container. Or, you can query it directly for batch prediction.
  • With Model Garden, you can explore, test, customize, and deploy Vertex AI and select open-source models and assets...

Model Garden | Vertex AI | Must-read model list 👀

Choose from world-class models from Google (Gemini), third parties (Meta's Llama 3.1), and open source (Anthropic's Claude Model Family) that meet Google's unique evaluation criteria. With over 160 curated models, each achieving best-in-class performance in its category, customers can leverage high-performance foundation models that best suit their business needs.

Vertex Explainable AI | Explainable AI

Supported model types: Any TensorFlow model that can provide an input embedding (latent representation) is supported. Tree-based models, such as decision trees, are not supported because they are inherently explainable. Models from other frameworks, like PyTorch or XGBoost, are not yet supported. For DNNs, it is generally assumed that the higher layers (closer to the output layer) have learned "meaningful" things, so the penultimate layer is often chosen for embeddings. You can test with a few different layers and investigate the resulting examples to select a layer based on quantitative (class match) or qualitative (looks reasonable) measures.

Overview of Vertex AI Experiments

You can track and evaluate your model's aggregate performance during training runs against a test dataset. This feature helps you understand the model's performance characteristics: how well a particular model is performing overall, where it's not performing well, and what it's good at.

Vertex AI Workbench

Vertex AI Workbench is a Jupyter notebook-based development environment for the entire data science workflow. From your Vertex AI Workbench instance's Jupyter notebook, you can interact with Vertex AI and other Google Cloud services. Vertex AI Workbench's integrations and features help you to access and process data, speed up your notebook runs, schedule notebook runs, and more.

  • Access and explore your data in a Jupyter notebook by using the BigQuery and Cloud Storage integrations.
  • Automate recurring updates to your model by using scheduled notebook runs on Vertex AI.
  • Process data quickly by running a notebook on a Dataproc cluster.

Vertex AI Model Monitoring

With Vertex AI Model Monitoring, you can run monitoring jobs on-demand or on a schedule to track the quality of your tabular models. If you set up alerts, Vertex AI Model Monitoring notifies you when metrics exceed thresholds you specify. For example, you have a model that predicts a customer's lifetime value. As customer habits change, so do the factors that predict customer spend. As a result, the features and feature values that you previously used to train your model might not be relevant for current predictions. This deviation in data is called drift.

Vertex AI Vizier

Vertex AI Vizier is a black-box optimization service that helps you tune hyperparameters in complex machine learning (ML) models. When an ML model has many different hyperparameters, manual tuning can be difficult and time-consuming. With Vertex AI Vizier, you can tune your hyperparameters and optimize your model's output. The default algorithm uses Bayesian optimization to more efficiently search the parameter space and derive the optimal solution. Black-box optimization is the optimization of a system that meets either of the following criteria:

  • It has no known objective function to evaluate.
  • The objective function is too expensive to evaluate, often due to the complexity of the system. (Vizier: advisor)

Occupancy analytics | Vertex AI

The Occupancy analytics model lets you count the number of people and vehicles based on specific inputs that you add to your video frames. Compared with the people and vehicle detection model, the occupancy analysis model provides advanced features. These features are active zone counting, line crossing counting, and dwell detection.

Vertex AI Agent Builder

Vertex AI Agent Builder lets developers, even those with limited ML skills, tap into the power of Google's foundation models, search expertise (semantic search), and conversational AI technologies to create enterprise-grade generative AI applications.

TensorFlow Data Validation

This includes looking at descriptive statistics, inferring a schema, checking for and fixing anomalies, and checking for drift and skew in our dataset. It's important to understand your dataset's characteristics, including how it might change over time in your production pipeline. It's also important to look for anomalies in your data, and to compare your training, evaluation, and serving datasets to make sure that they're consistent.

TensorFlow I/O

TensorFlow I/O is an extension package for TensorFlow that contains a collection of file systems and file formats that are not available in TensorFlow's built-in support (e.g., format conversion). It allows for integration with a number of systems and cloud vendors.

TensorFlow Model Analysis (TFMA)

The goal of TensorFlow Model Analysis is to provide a mechanism for model evaluation in TFX. Using TensorFlow Model Analysis, you can perform model evaluation in your TFX pipeline, and view the resulting metrics and plots in a Jupyter notebook. Specifically, it can provide:

TensorFlow Lite (TFLite)

LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. You can find ready-to-run LiteRT models for a wide range of ML/AI tasks. You can also use the AI Edge conversion and optimization tools to convert your TensorFlow, PyTorch, and JAX models to the TFLite format to run them. Multi-platform support: compatible with Android devices, iOS devices, embedded Linux, and microcontrollers.

Get predictions for a forecast model | Vertex AI

To make a batch prediction request, you can use either the Google Cloud console or the Vertex AI API. Your input data source is a CSV object stored in a Cloud Storage bucket or a BigQuery table. AutoML forecasting does not support endpoint deployment and online predictions. To request online predictions from a forecasting model, use the tabular workflow for forecasting.

Use a custom container for prediction | Vertex AI

To customize how Vertex AI serves online predictions from your custom-trained model, you can specify a custom container instead of a pre-built container when you create a Model resource. With a custom container, Vertex AI runs an arbitrary Docker container of your choice on each prediction node.

Hello image data: Train an AutoML image classification model

Incremental learning usually results in faster training and a shorter training time.

Manage BigQuery ML models in Vertex AI

Registering BigQuery ML models to the Vertex AI Model Registry lets you manage them alongside your Vertex AI models without exporting them. After you register a model to the Model Registry, you can use a single interface to version, evaluate, and deploy your models for online prediction without a serving container. You can register a model to the Model Registry by using the MODEL_REGISTRY option in the CREATE MODEL statement.

Tips for reducing memory usage | TensorFlow - Troubleshooting TPUs

  • Check for excessive tensor padding
    • Tensor padding is an operation performed to adjust the size of input data so that the model can process it correctly. It is mainly used in convolutional layers and RNNs, with zero-padding being the most common.
  • Use the bfloat16 format.
  • If your input size (model) is too large, you might be able to use TensorFlow's experimental model parallelism to fit the model.

Vertex AI Feature Store

Vertex AI Feature Store is a cloud-native, fully managed feature store service that is an integral part of Vertex AI. It streamlines the ML feature management and online serving process. You can manage your feature data in tables or views in BigQuery and serve online directly from BigQuery data sources. Vertex AI Feature Store provisions resources that let you set up online serving by specifying feature data sources. It then acts as a metadata layer that interacts with the BigQuery data sources. It serves the latest feature values directly from BigQuery for online predictions with low latency.

Custom Prediction Routines | Vertex AI

Custom prediction routines (CPR) are an easy way for you to build a custom container with pre- and postprocessing code, without having to configure an HTTP server or build a container from scratch. You can use preprocessing to normalize or transform your inputs, call an external service for additional data, and use postprocessing to format your model's prediction, or run business logic.

  • You don't need to write a model server or a Dockerfile. A model server is provided for you.
  • You can deploy and debug your model locally, which can accelerate the iteration cycle during development.

Use managed datasets | Vertex AI

Learn how to train a custom model using a Vertex AI managed dataset. With a managed dataset, you can do the following:

  • Manage your datasets in one central place.
  • Easily create labels and multiple annotation sets.
  • Create a human labeling task using integrated data labeling.
  • Track your model lineage for governance and iterative development.
  • Train AutoML models and custom models using the same dataset and compare their performance.
  • Generate and visualize data statistics.
  • Automatically split your data into a training, test, or validation set.

Build your own Retrieval Augmented Generation | Vertex AI Agent Builder

The Directed Acyclic Graph (DAG) is a helpful reference.

ExampleGen TFX | TensorFlow

ExampleGen is a TFX pipeline component that ingests data into TFX pipelines. It consumes data from external files/services and generates Examples which will be read by other TFX components. It also provides a consistent and configurable partition of the dataset, and shuffles the dataset for ML best practice. Input: Data from external data sources like CSV, TFRecord, BigQuery. Output: tf.Example records.

AI Accelerators | Google Cloud | Official

AI accelerators are specialized hardware or software designed to speed up AI processing. They mainly streamline the training and inference of deep learning and machine learning.

  1. TPU (Tensor Processing Unit)
    • Specialized hardware developed by Google to accelerate Deep Learning.
    • For models dominated by matrix computations.
    • TPUs are not recommended for workloads that require high-precision arithmetic and are recommended for models that train for weeks or months (from practice exam).
    • Significant performance and cost-effectiveness improvements for large-scale training compared to GPUs.
  2. GPU (Graphics Processing Unit)
    • Has high parallel computing capability, used for AI model training; NVIDIA products are mainstream.
    • GPUs are hardware well-suited for deep learning training that involves high-precision training, and by distributing training across multiple instances, they offer maximum flexibility for fine-tuning accelerator choices to minimize execution time (from practice exam).
  3. CPU
    • For rapid prototyping that requires maximum flexibility.
    • For simple models that don't take long to train.
    • For small models with small effective batch sizes.
    • For models that are dominated by custom TensorFlow operations written in C++.

State of the scheduler API in Vertex AI | Vertex AI

ACTIVE PAUSED COMPLETED

ConditionalParameter | Vertex AI | Cost Reduction

A ConditionalParameterSpec object lets you add a hyperparameter to a trial when the value of a parent hyperparameter matches a specified condition. For example, you can define a hyperparameter tuning job to find the best model using either linear regression or a deep neural network (DNN). To specify the training method in your tuning job, you define a categorical hyperparameter named training_method with LINEAR_REGRESSION and DNN. When training_method is LINEAR_REGRESSION, the tuning job needs to specify a hyperparameter for the learning rate. When training_method is DNN, the tuning job needs to specify hyperparameters for the learning rate and the number of hidden layers.

Sidecar mode and ESP | Cloud Endpoints

  • Sidecar mode: A method where a proxy providing API management functions (Extensible Service Proxy (ESP)) runs within the same container as each application instance.
  • ESP: A proxy that provides API request authentication, traffic management, monitoring, logging, etc. (Summarized by GPT).
API Management System (Example sidecar mode configuration)

├── Application Service
│ ├── Application Logic (API)
│ └── Data Storage, External Services, etc.

├── ESP (API Management Proxy)
│ ├── Authentication Function
│ ├── Traffic Control
│ ├── Monitoring Function
│ ├── Logging Function
│ └── Rate Limiting

└── Other Components
├── Request Gateway (Accepts API requests)
└── API Gateway (Coordinates proxy and application service)

Prevent overfitting

A caveat for training BigQuery ML models is overfitting. Overfitting is when a model fits the training data too well, resulting in poor performance on new data. BigQuery ML supports two methods for preventing overfitting: early stopping and regularization.

Tune hyperparameters | Overfitting

Dropout rate: The dropout layer is used for regularization in the model. It defines the fraction of the input to drop. Recommended range: 0.2 to 0.5. Learning rate: This is the frequency with which the neural network weights change between iterations. A high learning rate can cause weights to fluctuate wildly and may prevent them from finding their optimal values. A low learning rate is fine, but it will require more iterations to converge. We recommend starting with 1e-4. If your training is very slow, increase this value. If your model is not learning, try lowering the learning rate.

Types of prediction logs | Vertex AI | Online prediction logging

Container logging The stdout and stderr streams from your prediction nodes are logged to Cloud Logging. These logs are useful for debugging. For v1 service endpoints, container logging is enabled by default. You can disable it when you deploy a model. You can also disable or enable logging when you modify a deployed model. Access logging Information such as the timestamp and latency for each request is logged to Cloud Logging. For both v1 and v1beta1 service endpoints, access logging is disabled by default. You can enable access logging when you deploy a model to an endpoint. Request-response logging A sample of online prediction requests and responses are logged to a BigQuery table. To enable request-response logging, create or patch a prediction endpoint.

Choose between Healthcare Natural Language API and AutoML Entity Extraction for Healthcare | Cloud Healthcare API

The Healthcare Natural Language API provides a pre-trained natural language model that extracts medical concepts and relationships from medical text. The Healthcare Natural Language API maps text to a predefined set of medical knowledge categories. AutoML Entity Extraction for Healthcare lets you create a custom entity extraction model trained using your own annotated medical text and your own categories.

Dataflow ML

Dataflow ML uses a combination of Dataflow and the Apache Beam RunInference API. The RunInference API lets you define your model's characteristics and properties and then pass that configuration to the RunInference transform. This feature lets you run your model within your Dataflow pipeline without needing to know the implementation details of the model. You can choose the framework that works best for your data, such as TensorFlow or PyTorch.

RunInference API | Dataflow

You can use the RunInference API to build a pipeline that contains multiple models. A multi-model pipeline is useful for tasks such as conducting A/B tests or building an ensemble to solve a business problem that requires multiple ML models. When building a pipeline with multiple models, you can use one of two patterns:

  • A/B branch pattern: A portion of the input data is sent to one model, and the rest of the data is sent to a second model.
  • Sequence pattern: The input data passes through two models in sequence.

Scikit-learn | Primarily CPU-based -> GPU support becoming available

  • Scikit-learn is mainly optimized for CPU-based computations and is suitable for small to medium-sized datasets. Performance may be limited for large datasets or when real-time responses are needed. ↓
  • If you want to leverage GPUs, consider NVIDIA's cuML library, which has an API similar to scikit-learn, and can be run in environments like a Deep Learning VM (DLVM). Transition:
  • Since 2023, there is a growing, although limited, number of scikit-learn estimators that can run on a GPU if the input data is provided as a PyTorch or CuPy array and scikit-learn is configured to accept such inputs.

Request-Response Logging | samplingRate | Vertex AI

On dedicated and Private Service Connect endpoints, you can use request-response logging to log request-response payloads smaller than 10 MB for TensorFlow, PyTorch, sklearn, and XGBoost models.

  • samplingRate: The fraction of requests and responses to log. Must be a value greater than 0 and less than or equal to 1. For example, to log every request, set this value to 1; to log 10% of requests, set it to 0.1.
  • Lowering samplingRate allows for rapid drift detection while reducing monitoring costs, as fewer data points are considered for monitoring.

March 30, 2025 - Failure Report

For Next Time:

It's possible my choice of practice questions was wrong from the start, but for the next attempt, I'll start by reviewing my existing practice questions and aim for perfection on the practice exams. I'll also take this as a chance to deepen my understanding of fundamental ML concepts.

Reflections on Failing (2025/03/30)

Exam Overview

  • 50 questions / 120 minutes
  • I barely finished the first pass in 105 minutes and could only review about 2 questions.

Impressions of the Exam

  • Difficult. And many of the English words were hard to understand.
  • More than I expected, there were many complex questions combining Google Cloud services with ML implementation, training, and improvement, rather than just fundamental ML understanding.
  • In particular, I felt there were many questions about implementing ML pipelines.
  • Frustrating. Especially since I had gone through the practice questions twice, scored around 70% on past exams, and also studied with videos.
  • I also have a strong impression that not many similar questions appeared from the past exams.

For Next Time It's possible my choice of practice questions was wrong from the start, but for the next attempt, I'll start by reviewing my existing practice questions and aim for perfection on the practice exams. I'll also take this as a chance to deepen my understanding of fundamental ML concepts.

Question Trends

  • Many questions about ML pipelines / data processing.
  • There were also questions about selecting recent models like Gemini and Llama.
  • TensorFlow I/O
  • A few questions on technology selection from pipeline to monitoring.
  • Many questions on selecting targets for performance tuning (TPU/GPU...).
  • I recall questions about selecting recommendations where only 'Matrix Factorization' from 'Collaborative Filtering using Matrix Factorization' was an option.
  • The softmax function appeared many times.
  • Many questions that tested the understanding of classification vs. regression problems while selecting models.
  • Many questions about explainability.
  • Questions about data anonymization.
  • The word 'ExampleGen' appeared frequently.
  • The RunInference API also appeared frequently.
  • The word 'accelerator' was also frequent.
  • The word 'Metadata' appeared quite a bit in questions about pipeline configuration.

Failure Report Table:

SectionApproximate % Scored QuestionsSection Performance
Section 1: Architecting low-code AI solutions13%Meets
Section 2: Collaborating within and across teams to manage data and models14%Does Not Meet
Section 3: Scaling prototypes into ML models18%Does Not Meet
Section 4: Serving and scaling models20%Borderline
Section 5: Automating and orchestrating ML pipelines22%Does Not Meet
Section 6: Monitoring AI solutions13%Does Not Meet

2025/04/15: Submitted Improvement Request to Udemy Practice Exam Author

  • 2025/04/16: Submitted the following request on Udemy.
  • 2025/04/16: The author responded with thanks and offered a free coupon.
Improvement Request

Subject: Improvement Request regarding Google Cloud Certification Professional Machine Learning Practice Exam