Paddle OCR의 Memory leak - OOM 이슈 해결

데이터 과학

by Taeyoon.Kim.DS 2023. 11. 2. 19:00

해당 이슈를 해결하기 위해서 OCR enabled product는 t3.large에 queue가 연결되도록 한다.

1. Run prediction

2. SWOOP exports prediction json

3. SWOOP send a queue to SQS. You can check the export queues from Resque.

4. EC2 instance picks up a queue from SQS - ai-heavy-input-queue (t3.large) / ai-input-queue (t3.medium)

5. EC2 instance completes prediction and send a queue to SQS - ai-output-queue. If failes, ai-dead-letter-output-queue.

Final test로 16000개의 OCR enabled product listings를 run prediction을 하고 전체가 OOM 없이 돌아가는지 확인하는 것이 끝이다. 그러나 이미지 사이즈가 2개밖에 되지 않기 때문에, OOM 에러가 나지 않겠지만 그래도 전체를 돌려보는 것에 의미를 두고 얼마나 걸리는지 확인해보면 좋을 것이다.

To resolve the issue with the OCR enabled product, you need to set up a workflow that involves various AWS services like SQS (Simple Queue Service) and EC2 instances.

Run Prediction:
Start the process by running the OCR prediction. This could be an application or a script that processes images or documents to extract text.

SWOOP Exports Prediction JSON:
Once the prediction is complete, the SWOOP system (assuming it's your application or service) should export the results in a JSON format.

Application Sends a Queue to SQS:
Application needs to send a message to an AWS SQS queue. This message should contain information or references (like file paths or identifiers) to the exported JSON.
You can check and manage these export queues using Resque, which is a Redis backed library for creating background jobs.

EC2 Instance Picks Up a Queue from SQS:
Set up two types of EC2 instances: t3.large for heavy input processing and t3.medium for standard processing.
Configure these instances to poll the SQS queue (ai-heavy-input-queue for t3.large and ai-input-queue for t3.medium) and process messages as they arrive.

EC2 Instance Completes Prediction and Sends a Queue to SQS:
After processing, the EC2 instance should send a message to another SQS queue (ai-output-queue) with the results of the prediction. In case of failure, the message should be sent to a different queue, typically a dead-letter queue (ai-dead-letter-output-queue), for error handling and further investigation.

Additional Considerations:
Scaling and Load Balancing: Depending on the volume of predictions, you might need to scale your EC2 instances. AWS Auto Scaling can help in dynamically adjusting the number of instances.

Monitoring and Logging: Implement monitoring and logging (using AWS CloudWatch or similar tools) to track the performance and catch any issues in the workflow.

Security: Ensure that all data, especially if it's sensitive, is handled securely throughout the process. Use IAM roles and policies to control access to AWS resources.

Error Handling: Implement robust error handling, especially in the interaction with SQS and during the prediction process. This includes retries, dead-letter queues, and alerting mechanisms.

저작자표시 비영리 변경금지

'데이터 과학' 카테고리의 다른 글

[MLFlow] How to Use MLFlow Tracking (0)	2023.11.03
Imputing missing data (0)	2023.11.02
Tensorflow - Fingerprint not found. Saved model loading will continue logger.info (0)	2023.11.02
Install Opencv - libGL.so.1 error (0)	2023.11.02
Keras - Fingerprint not found. Saved model loading will continue (0)	2023.11.01