LLM Data Engineer | United States | Fully Remote

Remote Full-time
We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE. You will work on highly visible strategic projects, collaborating with cross-functional teams to define requirements and deliver high-quality AI solutions. The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.Responsibilities • Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes • Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data • Benchmark and implement various vector stores, embedding techniques, and retrieval methods • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search) • Implement and maintain auto-tagging systems and data preparation processes for LLMs • Develop tools for text and image data crawling, cleaning, and refinement • Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models • Work with data lake house architectures to optimize data storage and processing • Integrate and optimize workflows using Snowflake and various vector store technologies Requirements• Master's degree in Computer Science, Data Science, or a related field • 3-5 years of work experience in data engineering, preferably in AI/ML contexts • Proficiency in Python, JSON, HTTP, and related tools • Strong understanding of LLM architectures, training processes, and data requirements • Experience with RAG systems, knowledge base construction, and vector databases • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated) • Knowledge of data crawling techniques and associated ethical considerations • Strong problem-solving skills and ability to work in a fast-paced, innovative environment • Familiarity with Snowflake and its integration in AI/ML pipelines • Experience with various vector store technologies and their applications in AI • Understanding of data lakehouse concepts and architectures • Excellent communication, collaboration, and problem-solving skills. • Ability to translate business needs into technical solutions. • Passion for innovation and a commitment to ethical AI development. • Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions. • Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies. Preferred Skills Experience with popular LLM/ RAG frameworks Familiarity with distributed computing platforms (e.g., Apache Spark, Dask) Knowledge of data versioning and experiment tracking tools Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing Understanding of data privacy and security best practices Practical experience implementing data lakehouse solutions Proficiency in optimizing queries and data processes in Snowflake or Databricks Hands-on experience with different vector store technologies BenefitsUS employees benefit package. Apply tot his job
Apply Now →

Similar Jobs

Remote Coding Expertise for AI Training

Remote Full-time

[Remote] Staff Product Manager (Enterprise & AI Governance)

Remote Full-time

[Remote] AI Research Scientist, CoreML - Monetization AI

Remote Full-time

AI Researcher

Remote Full-time

[Remote] Artificial Intelligence Researcher

Remote Full-time

Engineering Manager - Generative AI

Remote Full-time

[Remote] AI Engineer (LLMs for Healthcare)

Remote Full-time

Full-Stack Developer Needed: AI Chat + AI Phone Voice Agent

Remote Full-time

Senior Backend/AI Engineer (EMEA Remote)

Remote Full-time

AI/ML Engineer - Direct Client (Remote)

Remote Full-time

Experienced Social Media Customer Support Specialist – Electric Vehicle and Renewable Energy Industry Expertise

Remote Full-time

Experienced Part-Time Remote Data Entry Specialist – Accurate and Efficient Data Management Professional for blithequark

Remote Full-time

Experienced Remote Data Entry Manager – Virtual Administrative Support & Client Management

Remote Full-time

Experienced Customer Service and Inside Sales Representative for Property and Casualty Insurance – Remote Opportunity with arenaflex for English-Only or Bilingual Spanish/English Speakers

Remote Full-time

[Remote] Queen's Genetically Engineered Machine (QGEM) - Wet Lab Team Lead (SWEP)

Remote Full-time

Sr Manager, Learning Technology (Remote)

Remote Full-time

Associate Public Health Consultant

Remote Full-time

Claims Adjuster - Liability (Remote- Needs Home State Lic and Must obtain NY lic)

Remote Full-time

(Fluent English) Content Growth Marketer

Remote Full-time

Experienced Physical Education Teacher for Anne Arundel County Public Schools - Inspiring Student Success through Innovative Instruction and Community Engagement

Remote Full-time
← Back to Home