Senior Prompt Engineer-Data Science & Quality Analysis

Checkmate • Full-time • Remote • 3d ago

As a Senior Prompt Engineer focused on Data Science and Quality Analysis, you’ll design, test, and evaluate prompts for AI systems that interact with real-world restaurant data. You’ll work cross-functionally to develop AI solutions that drive operational efficiency, improve data interpretation, and support smarter decision-making for restaurant operators.

Your work will directly influence how AI models perform in high-stakes, dynamic environments such as order processing, reporting, support automation, and performance analysis.

Proven experience in prompt engineering and working with LLMs (GPT-4, Claude, Gemini, and LLaMA) for text generation, reasoning, and structured data extraction.
Proficiency in Python and SQL for data analysis, evaluation scripting, and workflow automation.
Strong background in A/B testing, statistical analysis, and performance metrics

evaluation, with the ability to design experiments and interpret data-driven insights for continuous model optimization.

Familiarity with prompt-evaluation tools such as LangFuse or Galileo, and Weights and Biases for experiment management and regression testing.
Deep understanding of advanced prompting techniques, including few-shot prompting, reasoning-based prompting, multi-turn dialogue design, agentic orchestration, and DSPy/AdaFlow-style programmatic prompting frameworks.
Experience applying CO-STAR and TIDD-EC! prompting frameworks for structured reasoning, instruction design, and context control in production-grade LLM systems.
Excellent requirement-elicitation and communication skills, with the ability to translate business objectives into prompt engineering solutions.
Analytical mindset with a process-driven approach to optimizing model behavior, data quality, and operational workflows.
Design, test, and optimize LLM prompts for conversational AI, text classification, and structured data extraction tasks.
Build evaluation pipelines to analyze prompt performance using quantitative metrics, human-in-the-loop feedback, and business KPIs.
Conduct prompt experiments and regression testing to ensure stability, accuracy, and safety as models evolve.
Collaborate with Machine Learning, Product, and Operations teams to translate business objectives into scalable, data-driven prompt-engineering strategies that enhance model accuracy, efficiency, and real-world usability.
Use Python/SQL to analyze model outputs, identify anomalies, and automate quality checks.
Document best practices and contribute to internal frameworks for prompt evaluation and continuous improvement.
Communicate findings effectively to technical and non-technical stakeholders, driving measurable business impact through insight-driven decisions.

100% Remote

Salary $145,000 - $160,000

B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field) or in a field relevant to language models (Linguistics, Philosophy, Cognitive Science, etc.).
5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master’s degree.
Demonstrated proficiency in Python for automation, evaluation, and experimentation with LLM workflows.
Academic or applied research experience related to language models, prompt engineering, or LLM-based systems is a strong plus.
Familiarity with LLM architectures, embeddings, and fine-tuning techniques preferred.
Experience with LLM red-teaming, adversarial evaluation, or model safety testing is a plus.