Software Engineer Data Infrastructure
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer Data Infrastructure (Backend): Design, build, and operate scalable, fault-tolerant infrastructure for distributed training pipelines and multimodal data catalogs with an accent on large-scale data processing, distributed compute frameworks, and infrastructure efficiency. Focus on building high-throughput data ingestion systems, ensuring traceability and quality control, and collaborating closely with research teams to accelerate AI training cycles.
Location: San Francisco, California, United States
Salary: $350,000 - $475,000 USD per year
Company
empowers humanity by advancing collaborative general intelligence, building widely used AI products and open-source tools.
What you will do
- Design and operate scalable, fault-tolerant distributed compute and data orchestration infrastructure for LLM research.
- Develop high-throughput systems for data ingestion, processing, deduplication, quality checks, and search.
- Build systems ensuring traceability, reproducibility, and quality control throughout the data lifecycle.
- Implement monitoring and alerting to maintain platform reliability and performance.
- Collaborate with research teams to unlock new features and accelerate training cycles.
Requirements
- Location: Must be based in San Francisco, California, United States
- Bachelor’s degree or equivalent experience in computer science or engineering.
- Proficiency in Python or Rust and experience with distributed compute frameworks like Apache Spark or Ray.
- Deep familiarity with cloud infrastructure, data lake architectures, and batch and streaming pipelines.
- Ability to operate across the stack and own projects end-to-end in a collaborative environment.
- Visa sponsorship available but not guaranteed; commitment to work through visa process required
Nice to have
- Experience with Kafka, dbt, Terraform, and Airflow.
- Experience building web crawlers.
- Strong knowledge of file formats and storage systems like Parquet and Delta Lake.
- Experience scaling deduplication, data mining, and search systems.
- Proactive about documentation, testing, and tooling.
Culture & Benefits
- Generous health, dental, and vision benefits.
- Unlimited PTO and paid parental leave.
- Relocation support as needed.
- Collaborative and high-impact work environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →