Warsaw Events: Scrape Waw4free.pl
Executive Summary
Let's get this party started, guys! We're looking to add waw4free.pl as a new source to our event scraper, specifically for free events happening in Warsaw, Poland. This is a big win because it will help us find even more cool stuff for you to do! Think of it like a local guide, just for Warsaw, similar to how we already have Karnet for Kraków. The website is in Polish, so we'll need to get our Polish game on, but don't worry, we've got a plan! It's estimated to be a medium level of work, similar to what we did for Karnet.
- Website: https://waw4free.pl/
- Target: Free Events in Warsaw, Poland
- Language: Polish
- Priority: 30-40 (Local/Regional)
- Pattern: Multi-stage HTML scraper
- Complexity: Medium
waw4free.pl is the go-to spot for free events in Warsaw, think concerts, workshops, exhibitions, theater, sports, and family fun. It has all the details you need: where, when, what, and even links to the event organizers. Adding this will be super helpful for everyone looking for free things to do in Warsaw.
Website Technical Deep Dive
Alright, let's dive into the technical details, so you know how we'll get this done.
URL Structure
It's all about how the website is set up, so we can grab all the juicy event details.
- Homepage: https://waw4free.pl/
- Event Detail Pages:
/wydarzenie-{id}-{slug}(e.g.,/wydarzenie-144172-black-maze-4-labirynt-strachu) - Category Listing:
/warszawa-darmowe-{category}(e.g.,/warszawa-darmowe-koncerty) - Event ID: We'll grab this from the URL (like
144172).
Data Fields We'll Be Snagging
Here's what info we'll be pulling from each event page. It's the good stuff!
Event Detail Pages Provide:
- ✅ Title (the headline)
- ✅ Category (like "concerts" or "workshops")
- ✅ Date (in Polish, of course: "poniedziałek, 3 listopada 2025")
- ✅ Time (24-hour format: "15:00")
- ✅ Venue name and address (e.g., "Galeria Północna, ul. Światowida 17")
- ✅ District (Warsaw areas: Białołęka, Praga-Południe, Śródmieście, etc.)
- ✅ Full description (in HTML, so we get all the formatting)
- ✅ Event image (a picture is worth a thousand words!)
- ✅ Source URL (link to the event organizer's website)
- ✅ Google Maps link (for finding the place)
- ⚠️ Sometimes, there's info about voluntary donations: "(dobrowolna zrzutka za udział)"
Category Listing Pages:
- Event cards with titles, categories, dates, times, districts
- 60+ events per category page
- No pagination (all events load on one page)
- Multiple categories per event possible
Event Categories (Polish)
These are the types of events we can find, all in Polish, naturally.
koncerty(concerts)warsztaty(workshops)wystawy(exhibitions)teatr(theater)sport(sports)dla-dzieci(for children)festiwale(festivals)inne(other)
Warsaw Districts
We'll make sure to note which part of Warsaw the event is in.
Białołęka, Praga-Południe, Śródmieście, Wawer, Wilanów, Żoliborz, Mokotów, Ursynów, Wola, Targówek, Bemowo, Bielany, Ochota, Rembertów, Wesoła, Włochy, Ursus
The Nitty-Gritty: Technical Requirements
Here’s how we'll build this thing, step by step.
1. Polish Language Support
Date Parser Plugin
We need a way to read those Polish dates. It’s a whole new file we'll need to create!
New file needed: lib/eventasaurus_discovery/shared/parsers/date_patterns/polish.ex
What it needs to do:
- Read Polish dates like: "poniedziałek, 3 listopada 2025"
- Know the Polish names for days: poniedziałek (Monday), wtorek (Tuesday), środa (Wednesday), czwartek (Thursday), piątek (Friday), sobota (Saturday), niedziela (Sunday)
- Know the Polish names for months: stycznia, lutego, marca, kwietnia, maja, czerwca, lipca, sierpnia, września, października, listopada, grudnia
- Work with our existing
MultilingualDateParser(which already handles French and English) - Give us the
DateTimein UTC (timezone: "Europe/Warsaw")
Need a reference? Check out lib/eventasaurus_discovery/shared/parsers/date_patterns/french.ex to see how it's done.
Category Mapping
We also need to translate the Polish categories into something we understand. Time for a new file!
New file needed: priv/category_mappings/waw4free.yml
Polish → Internal Taxonomy Mapping:
# waw4free.pl category mappings
concerts:
- koncerty
workshops:
- warsztaty
exhibitions:
- wystawy
theater:
- teatr
sports:
- sport
family:
- dla-dzieci
festivals:
- festiwale
other:
- inne
Need a reference? See how we did it for priv/category_mappings/karnet.yml and priv/category_mappings/sortiraparis.yml.
2. Scraper Architecture
Directory Structure
Here’s how we'll organize all the code.
lib/eventasaurus_discovery/sources/waw4free/
├── source.ex # Configuration & metadata (Priority 30-40)
├── config.ex # Runtime settings (base_url, rate limits)
├── transformer.ex # Data transformation to unified format
├── client.ex # HTTP client with rate limiting
├── html_parser.ex # HTML parsing utilities
├── jobs/
│ ├── sync_job.ex # Index job: Scrape category listings
│ └── event_detail_job.ex # Detail job: Fetch individual events
└── README.md # Documentation
priv/category_mappings/waw4free.yml # Category mapping
Pattern: It’s a two-step process, like the Karnet scraper:
- Stage 1 (SyncJob): Grab all the category listing pages and find event URLs.
- Stage 2 (EventDetailJob): Go to each event page and get all the details.
3. External ID Generation
How we’ll identify each event:
- We'll use the event ID from the URL as a unique
external_id. - URL format:
/wydarzenie-{id}-{slug} - Extract ID: Get the
144172from/wydarzenie-144172-black-maze-4-labirynt-strachu. - External ID: We'll create something like `