[MLFlow] How to Use MLFlow Tracking

데이터 과학

by Taeyoon.Kim.DS 2023. 11. 3. 02:22

https://www.youtube.com/watch?v=x3cxvsUFVZA&list=PL2ivGIeWjZUhs_NX_xwR8yggvyB6wSnQh

git clone https://github.com/dmatrix/mlflow-workshop-part-1.git

ML
Goal:

Optimise metric (e.g. accuracy, constantly experiment to improve it)

Quality depends on input data and tuning parametres.

Compare+combine many libraries, model

Diverse deployment environments

ML cycle

Raw Data - > Data Prep -> Training -> Deploy -> Raw Data -> Data Prep -> Training -> Deploy -> Raw Data

For example,
1. Using XGboost as a baseline or using Scikit-Learn and compare

2. Tuning each tool's parameters consistantly.

3. Dealing with big amount of data scale

4. How do you actually ensure the model exchange the best trained model is deployed? How do you ensure that?

5. Governancing data privacy and ethics.

## Custom ML platforms

+ Some big data companies standardise the data prep (json) / training (EC2 instance) / deploy loop (SQS, EC2 instance, Listner and Queues).

- Limited to a few algorithms or frameworks

- Tied to one company's infrastructure

- Out of luck if you left the company

MLFlow

works with popular ML library & language

Runs the same way anywehre

Designed to be useful for 1 or 1000+ person orgs

Simple. Modular. Easy to use.

Offers positive developer experience to getg started.

MLFlow Components

Tracking - Record and query experiments : codess, data, config, etc
Parameters - certain parameters tracking, how many trees I used for random forest etc.

Metrics - What loss function you use, numeric values (roc auc score saving)

, Tags and Notes, Artifcats, Source, Version, run, Experiment

Run : as instance of code that runs by MLflow

Experiment: {Run, ... Run}

Old tracking way - log file. putting into the json file, no tracking.

What if tune lr change, changing, code version changes?

then running mlflow ui - each every run more deeper what the run constitutes, what are some artificatcs used, side byt side metrics giving you more systematic tracking your experiments, which is the best metric? I will choose this model for that next stage.

Projects

Models

Model Registry

저작자표시 비영리 변경금지 (새창열림)

'데이터 과학' 카테고리의 다른 글

[SQL] 데이터 조회 (SELECT) (0)	2023.11.06
[SQL] SQL 및 관계형 데이터베이스 (0)	2023.11.06
Imputing missing data (0)	2023.11.02
Paddle OCR의 Memory leak - OOM 이슈 해결 (0)	2023.11.02
Tensorflow - Fingerprint not found. Saved model loading will continue logger.info (0)	2023.11.02