In rule-based web scraping, the slightest change in website layout breaks the process, prompting the script overhaul to adapt to a new layout. With machine learning (ML), you don’t have to set up or readjust a dedicated parser for an individual web page. The trained model recognizes prices, descriptions, or anything it was trained to do, even after layout changes.

Tune in to the Oxylabs webinar to grasp the ins and outs of ML-based parsing. Tadas Gedgaudas, a developer at Oxylabs, shared his knowledge of large language models – ChatGPT in this case – and their integration into the web scraping process.

Tadas has covered the following:

➡️ Nuances of data structurization with and without ML.
➡️ A walkthrough of getting, preparing, and submitting data to ChatGPT.
➡️ A detailed demo of combining ChatGPT with Oxylabs Web Scraper API to scrape and parse web pages without building your own tools.

The webinar is an essential stepping stone for developers and decision-makers in understanding how ML-enabled parsing saves time, drastically reduces maintenance, and turns any website into structured data.

For your convenience, Tadas has provided code samples of his presentation. You can access an open-source Oxy® Parser library here: https://github.com/oxylabs/OxyParser
  • 1710407660-d17229ec3f3b64dd
    Tadas Gedgaudas
    Developer at Oxylabs
    From the very beginning of his software development career, Tadas focused on web data extraction. In fact, his very first project was a web scraper. As a web scraping engineer, Tadas is product-minded and, one could say, obsessed with making software as performant as possible. In turn, he practices productivity tracking and even dedicates his pastime to crafting an open-source ML-powered data parsing library.