상세 컨텐츠

본문 제목

[MLFlow] How to Use MLFlow Tracking

데이터 과학

by Taeyoon.Kim.DS 2023. 11. 3. 02:22

본문

https://www.youtube.com/watch?v=x3cxvsUFVZA&list=PL2ivGIeWjZUhs_NX_xwR8yggvyB6wSnQh

 

git clone https://github.com/dmatrix/mlflow-workshop-part-1.git

 

ML
Goal:

Optimise metric (e.g. accuracy, constantly experiment to improve it)

Quality depends on input data and tuning parametres.

Compare+combine many libraries, model

Diverse deployment environments

 

ML cycle

Raw Data - > Data Prep -> Training -> Deploy -> Raw Data -> Data Prep -> Training -> Deploy -> Raw Data

For example,
1. Using XGboost as a baseline or using Scikit-Learn and compare

2. Tuning each tool's parameters consistantly.

3. Dealing with big amount of data scale

4. How do you actually ensure the model exchange the best trained model is deployed? How do you ensure that?

5. Governancing data privacy and ethics.

 

## Custom ML platforms

+ Some big data companies standardise the data prep (json) / training (EC2 instance) / deploy loop (SQS, EC2 instance, Listner and Queues).

- Limited to a few algorithms or frameworks

- Tied to one company's infrastructure

- Out of luck if you left the company

 

MLFlow

works with popular ML library & language

Runs the same way anywehre

Designed to be useful for 1 or 1000+ person orgs

Simple. Modular. Easy to use.

Offers positive developer experience to getg started.

 

MLFlow Components

Tracking - Record and query experiments : codess, data, config, etc 
Parameters - certain parameters tracking, how many trees I used for random forest etc.

Metrics - What loss function you use, numeric values (roc auc score saving)

, Tags and Notes, Artifcats, Source, Version, run, Experiment

Run : as instance of code that runs by MLflow

Experiment: {Run, ... Run} 

Old tracking way - log file. putting into the json file, no tracking.

What if tune lr change, changing, code version changes? 

then running mlflow ui - each every run more deeper what the run constitutes, what are some artificatcs used, side byt side metrics giving you more systematic tracking your experiments, which is the best metric? I will choose this model for that next stage.

 

 

Projects

Models

Model Registry

 

 

 

 

관련글 더보기