Назад
Company hidden
обновлено 2 месяца назад

Software Engineer (Python)

Формат работы
remote
Тип работы
fulltime
Грейд
middle
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (Python): Building a large-scale data ingestion and classification system with an accent on extracting data from diverse sources (web pages, APIs, PDFs), cleaning and normalizing it, and building search capabilities. Focus on creating scalable, high-performance data pipelines using Python, Scrapy, Airflow, Kubernetes, AWS, and Spark.

Location: Enjoy the flexibility of remote work

Company

hirify.global is the all-in-one, cloud-based platform helping auto repair shops run smarter, grow faster, and serve customers better.

What you will do

  • Build and design large scale, distributed crawling bots and infrastructure.
  • Develop and maintain data pipelines to extract data from large volumes of web pages, documents, PDFs, and APIs.
  • Help unify heterogeneous documents into a coherent data schema across varied source formats.
  • Preprocess and normalize raw data for downstream classification, ML/NLP, and search indexing.
  • Build APIs to expose structured, classified data via ElasticSearch/OpenSearch.
  • Collaborate with ML/NLP teams to integrate classification models into the pipeline.

Requirements

  • 3+ years of experience in Python with building crawling/scraping solutions at scale.
  • Experience working with APIs (REST), PDF processing (OCR, Tesseract, PyMuPDF etc.).
  • Proficiency in data processing & search technologies (ElasticSearch/OpenSearch, NoSQL/SQL databases).
  • Hands-on experience with Airflow and Spark (EMR) or similar distributed systems.
  • Strong problem-solving skills in handling anti-scraping mechanisms and data scaling challenges.
  • Hands-on experience with AWS or GCP.

Nice to have

  • Familiarity with NLP and Machine Learning.
  • Experience with LLMs, NLP models, or ML frameworks (e.g., Hugging Face, spaCy, TensorFlow, PyTorch).
  • Prior experience in automated document classification.
  • Experience working in high-scale, production environments with petabytes of data.
  • Hands-on experience with Kubernetes.

Culture & Benefits

  • Enjoy the flexibility of remote work.
  • Competitive base salaries that reflect your value.
  • Generous Paid Time Off.
  • Comprehensive health benefits, including Medical, Dental, Vision, and Prescription coverage.
  • 401(k) Retirement Savings Plan with 100% employer match on contributions up to 6%.
  • Support for continuing education.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →