https://www.youtube.com/watch?v=x3cxvsUFVZA&list=PL2ivGIeWjZUhs_NX_xwR8yggvyB6wSnQh
git clone https://github.com/dmatrix/mlflow-workshop-part-1.git
ML
Goal:
Optimise metric (e.g. accuracy, constantly experiment to improve it)
Quality depends on input data and tuning parametres.
Compare+combine many libraries, model
Diverse deployment environments
ML cycle
Raw Data - > Data Prep -> Training -> Deploy -> Raw Data -> Data Prep -> Training -> Deploy -> Raw Data
For example,
1. Using XGboost as a baseline or using Scikit-Learn and compare
2. Tuning each tool's parameters consistantly.
3. Dealing with big amount of data scale
4. How do you actually ensure the model exchange the best trained model is deployed? How do you ensure that?
5. Governancing data privacy and ethics.
## Custom ML platforms
+ Some big data companies standardise the data prep (json) / training (EC2 instance) / deploy loop (SQS, EC2 instance, Listner and Queues).
- Limited to a few algorithms or frameworks
- Tied to one company's infrastructure
- Out of luck if you left the company
MLFlow
works with popular ML library & language
Runs the same way anywehre
Designed to be useful for 1 or 1000+ person orgs
Simple. Modular. Easy to use.
Offers positive developer experience to getg started.
MLFlow Components
Tracking - Record and query experiments : codess, data, config, etc
Parameters - certain parameters tracking, how many trees I used for random forest etc.
Metrics - What loss function you use, numeric values (roc auc score saving)
, Tags and Notes, Artifcats, Source, Version, run, Experiment
Run : as instance of code that runs by MLflow
Experiment: {Run, ... Run}
Old tracking way - log file. putting into the json file, no tracking.
What if tune lr change, changing, code version changes?
then running mlflow ui - each every run more deeper what the run constitutes, what are some artificatcs used, side byt side metrics giving you more systematic tracking your experiments, which is the best metric? I will choose this model for that next stage.
Projects
Models
Model Registry
[SQL] 데이터 조회 (SELECT) (0) | 2023.11.06 |
---|---|
[SQL] SQL 및 관계형 데이터베이스 (0) | 2023.11.06 |
Imputing missing data (0) | 2023.11.02 |
Paddle OCR의 Memory leak - OOM 이슈 해결 (0) | 2023.11.02 |
Tensorflow - Fingerprint not found. Saved model loading will continue logger.info (0) | 2023.11.02 |