DIGITALIZATION: DEVELOPMENT OF A PDF SCRAPING PROGRAM TO AUTOMATE TASKS FROM OPERATIONS TEAMS IN MAERSK CUSTOMS SERVICES

dc.contributor.authorGee, Theophilus Ariel
dc.contributor.authorEng, Kho I
dc.contributor.authorHahn, Matthias
dc.date.accessioned2026-05-28T04:38:40Z
dc.date.issued2024-07-25
dc.description.abstractIt is undeniable that PDF invoice documents are extensively utilized across various stages of the business process, regardless of the industry. In the context of a customs service company, invoices serve as a crucial information medium that will be passed through different organizations. The amount of time required to extract information from PDF invoices hinders the business operations, especially when the volume is large. Therefore this thesis aims to offer a solution by implementing a machine learning approach for scraping data out of PDF invoices, utilizing the object detection model YOLOv8, in conjunction with a novel AutoTrainer program to help streamline model training and training data preparation. The divide and conquer strategy is introduced, a crucial process in reducing noise and increasing accuracy in detection. Additionally, this thesis provides a comparison between an existing rule based approach and the machine learning approach. The result proves that the machine learning approach with object detection is superior to the rule based approach. Although there is a challenge to reach perfect accuracy, the reusability and flexibility it offers overcomes its limitations, especially when compared to the rule based approach.
dc.identifier.urihttps://dspace-repository.sgu.ac.id/handle/123456789/255
dc.language.isoen
dc.publisherSwiss German University
dc.subjectDocument Analysis and Recognition
dc.subjectPDF invoice scraping
dc.subjectTable Detection
dc.subjectYOLO Object Detection
dc.subjectRobotic Process Automation
dc.titleDIGITALIZATION: DEVELOPMENT OF A PDF SCRAPING PROGRAM TO AUTOMATE TASKS FROM OPERATIONS TEAMS IN MAERSK CUSTOMS SERVICES
dc.typeThesis

Files

Original bundle

Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
COVER.pdf
Size:
196.08 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CHAPTER 1.pdf
Size:
201.34 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CHAPTER 2.pdf
Size:
116.43 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CHAPTER 3.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CHAPTER 4.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections