Project overview
summery
In this project, I successfully converted a large dataset from PDF format into Excel to enable more efficient data analysis and manipulation. The project involved the extraction of structured and unstructured data from multiple PDF documents, followed by transformation and organization in an Excel-friendly format.
-
Location
USA -
Start Date:
24.08.2024 -
Delivery Date:
25.08.2024 -
Project Status:
Completed
Project Details
Key responsibilities:
- Reviewed and analyzed the content of over 200 pages of PDF documents to identify data patterns and structures.
- Utilized advanced PDF conversion tools and custom-built scripts to extract text and tables.
- Cleaned and standardized the extracted data to ensure consistency and accuracy in Excel.
- Applied Excel formulas and pivot tables to organize data for enhanced usability and reporting.
- Validated the data by cross-referencing it with original PDF documents to ensure 100% accuracy.
- Delivered a final Excel file that was fully formatted and ready for analysis, saving the client significant time in manual data entry.
Used technologies & tools:
- Adobe Acrobat Pro DC for initial data extraction
- Python (Pandas) for automated data transformation
- Microsoft Excel for final data organization and formatting
Skills applied:
- Adobe Acrobat Pro DC for initial data extraction
- Python (Pandas) for automated data transformation
- Microsoft Excel for final data organization and formatting
Outcome:
This project resulted in a well-structured Excel file that allowed the client to easily manipulate and analyze their data, improving their operational efficiency. The data conversion saved over 50 hours of manual data entry and was completed with a 100% accuracy rate.