Back to projects
PRO

Antalis (KPP Group) — Competitive Intelligence (Scraping)

Automated scraping system turning competitor websites into structured commercial data

PythonSeleniumBeautifulSoupPandasExcel/CSV

The Problem

When I joined Antalis (KPP Group), competitive intelligence relied entirely on manual work. Monitoring prices meant finding the matching product on a competitor's site, noting the price, entering it into Excel, then stretching formulas to track price changes and compare positioning against our own rates — product by product, competitor by competitor. Supplier images? Downloaded one by one, renamed by hand, then resized. Sales prospects? Identified manually across different websites and copied into a spreadsheet. Everything was done manually — when it was done at all.

Approach

1

I started by proactively offering help to a market manager who was handling price monitoring on our biggest competitor. I built a first targeted Python/Selenium scraper that produced a clean, ready-to-use Excel file. The result convinced the team — we extended it to all main competitors.

2

A second market manager then asked whether I could retrieve stock levels from a specific competitor. I built the module, automated the export, and applied the same logic to other targets.

3

The Master Data team asked me to handle the prospects side: automatically extracting company information from various listing websites (previously done by hand) and producing a structured table by city and sales territory for field teams.

4

Image scraping was added to automate the retrieval and normalization of product visuals from competitor and supplier websites — eliminating manual downloading and renaming.

Technical Details

Anti-bot: HTTP header rotation, randomized delays between requests, redirect and error page handling. For more protected sites, Selenium with non-headless browser control.

Heterogeneous HTML structures: each competitor has a different architecture. Built target-specific parsers with CSS selectors and XPath expressions robust to minor layout variations.

Deferred JavaScript rendering: some e-commerce pages and Eloqua-hosted pages load their content via JavaScript after the initial page load. Resolved using Selenium with explicit waits (WebDriverWait) to ensure elements are present in the DOM before extraction.

Delivery adapted to user profiles: non-technical teams get a standalone executable (.exe) that launches the script and directly outputs the Excel file. Others submit ad-hoc requests processed on demand.

Challenges & Solutions

The biggest challenge wasn't technical — it was maintenance. Competitor websites change their HTML structure without warning. I set up simple alerts: if the scraper doesn't find what it's looking for, it logs the error and sends me a notification rather than outputting an empty Excel. This let me detect and fix breakages quickly without the teams noticing.

Data normalization across competitors was complex: each site presents prices, units and product references differently. I built a mapping layer that matches Antalis (KPP Group) references with equivalent products at each competitor, despite completely different naming conventions.

Results & Impact

  • Weekly automated price monitoring across all main competitors — zero manual action required from teams.
  • Est. 5–8 hrs/week saved across sales and marketing teams.
  • Sales teams have a structured prospect table by geographic zone, updated on demand.
  • Product images retrieved and normalized automatically — no more manual downloading.
  • 4 modules deployed in production, used by multiple teams with different levels of technical skill.

Architecture

Trigger (manual via .exe or weekly schedule) → Selenium/BeautifulSoup targets competitor site → anti-bot handling and JavaScript waits → HTML parsing and data extraction → normalization layer and reference mapping → structured Excel export → error notification if anomaly detected.

What I Learned

This project taught me that the real value of an internal tool isn't measured by its technical complexity, but by its adoption rate. An executable that a non-technical team can launch with a double-click has more impact than a sophisticated pipeline nobody uses. I also learned to design for maintainability from the start — alerts and logs aren't optional, they're what keeps a tool reliable over time.