Web Crawling Engineer

Формат работы

hybrid

Тип работы

fulltime

Грейд

middle/senior

Английский

Страна

France, UK, Singapore, US, Spain, Netherlands, Italy, Germany, Belgium, Switzerland, Luxembourg

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Web Crawling Engineer: Developing and maintaining web crawlers using Go to extract data from target websites with an accent on automating data collection processes. Focus on improving and optimizing existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Location: Primarily based in one of our European offices — Paris, France and London, UK. We will prioritize candidates who either reside there or are open to relocating. Remote candidates based in one of the countries listed in this job posting — currently France, UK, Germany, Belgium, Netherlands, Spain and Italy.

Company

hirify.global democratizes AI through high-performance, optimized, open-source, and cutting-edge models, products, and solutions.

What you will do

Developing and maintaining web crawlers using Go to extract data from target websites.
Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs and web pages to support business objectives.
Create and implement efficient parsing patterns using tokenizers, regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
Design and manage distributed job queues using technologies such as Redis, Aerospike and Kubernetes to handle large-scale distributed crawling and processing tasks.
Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Requirements

Proficiency in Go (Golang)/Rust/Zig for building scalable and efficient web crawlers.
Deep understanding of TCP, UDP, TLS and HTTP/1.1,2,3 protocols and web communication.
Knowledge of HTML, CSS, and JavaScript for parsing and navigating web content.
Familiarity with cloud platforms (AWS, GCP), orchestration (Kubernetes, Nomad), and containerization (Docker) for deployment.
Mastery of queues, stacks, hash maps, and other data structures for efficient data handling.
Ability to design and optimize algorithms for large-scale web crawling.

Nice to have

Experience with web archiving projects & tooling, open-source archiving is a big plus!
Experience applying Machine Learning to improve crawling efficiency or accuracy.
Experience with low-level networking programming and/or userspace TCP/IP stacks.

Culture & Benefits

Competitive salary and equity
Health insurance
Transportation allowance
Sport allowance
Meal vouchers
Generous parental leave policy

Hiring process

Introduction call - 35 min
Hiring Manager Interview - 30 min
Live-coding Interview - 45 min

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...

Web Crawling Engineer

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

Content Developer (JavaScript)

.NET Software Engineer (Barcelona- Hybrid) (AI)

Staff Software Engineer (Security)

Backend Developer (Node.js/AWS)

Staff Backend Engineer (Node.js, GCP)

Senior Rust Software Engineer