Python code for prompt engineering email anonymization with removal of PII
spreadsheet tracking prompt changes: https://docs.google.com/spreadsheets/d/11JkPmPXgS-eLh4ZjXaHmuY-d3s_-BBBZSaC_Uj8ZSog/edit?usp=sharing
Issue List:
- [DONE] Anonymization for some newsletter emails takes very long (will remove links in preprocessing and reevaluate)
- [DONE] Labeling is relatively inaccurate (will try allowing for multiple labels)
- [DONE] Json file is wrong a lot of the time (will try getting the tool_calls response + further debugging)
- [IN PROGRESS] tool_calls is None even for code in documentation, need to find updated code to ensure it is not None
Changes:
- [COMPLETE] Added subject in beginning of body since it might contain important info
- [IN PROGRESS] Implement code for 1000 examples dataset
Labeling Accuracies:
- Refunds - 1/1
- Cancellations - 5/5 (all were both refund and cancellation)
- Newsletters - 2/5 (other 3 were OCS)
- OCS - 4/5 (other was newsletter)
- Orders - 2/3 (other was only shipping/tracking email)