TL;DR
Web Crawling Engineer: Developing and maintaining web crawlers using Go to extract data from target websites with an accent on automating data collection processes. Focus on improving and optimizing existing web crawling infrastructure to maximize efficiency and adapt to new challenges.
Location: Primarily based in one of our European offices — Paris, France and London, UK. We will prioritize candidates who either reside there or are open to relocating. Remote candidates based in one of the countries listed in this job posting — currently France, UK, Germany, Belgium, Netherlands, Spain and Italy.
Company
hirify.global democratizes AI through high-performance, optimized, open-source, and cutting-edge models, products, and solutions.
What you will do
- Developing and maintaining web crawlers using Go to extract data from target websites.
- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs and web pages to support business objectives.
- Create and implement efficient parsing patterns using tokenizers, regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
- Design and manage distributed job queues using technologies such as Redis, Aerospike and Kubernetes to handle large-scale distributed crawling and processing tasks.
- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.
Requirements
- Proficiency in Go (Golang)/Rust/Zig for building scalable and efficient web crawlers.
- Deep understanding of TCP, UDP, TLS and HTTP/1.1,2,3 protocols and web communication.
- Knowledge of HTML, CSS, and JavaScript for parsing and navigating web content.
- Familiarity with cloud platforms (AWS, GCP), orchestration (Kubernetes, Nomad), and containerization (Docker) for deployment.
- Mastery of queues, stacks, hash maps, and other data structures for efficient data handling.
- Ability to design and optimize algorithms for large-scale web crawling.
Nice to have
- Experience with web archiving projects & tooling, open-source archiving is a big plus!
- Experience applying Machine Learning to improve crawling efficiency or accuracy.
- Experience with low-level networking programming and/or userspace TCP/IP stacks.
Culture & Benefits
- Competitive salary and equity
- Health insurance
- Transportation allowance
- Sport allowance
- Meal vouchers
- Generous parental leave policy
Hiring process
- Introduction call - 35 min
- Hiring Manager Interview - 30 min
- Live-coding Interview - 45 min
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →