AI Platform Specialist Job at VDart Inc, Remote

V29MWDk3Y0czY1dvUHRWY3ZESkliTnZ5anc9PQ==
  • VDart Inc
  • Remote

Job Description

Job Title: AI Platform Specialist
Job Location: Remote
Job Type: Contract


Job Description AI Service Hosting

AI Platform Specialists:

  • We are building a new team of platform specialists to support and enhance high-performance AI services. These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads.
  • As an AI Platform Specialist, these roles will provide application and GPU support. The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution. The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services. Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.

Key Responsibilities:

  • Platform Support & Incident Response
  • Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
  • Troubleshoot and resolve issues related to Kubernetes deployments, GPU utilization, and service performance.
  • Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues.
  • Kubernetes & Cloud-Native Operations
  • Full adoption, creation, and integrations into automated services using Helm, Ansible, Terraform, etc.
  • Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters.
  • Ensure adherence to pod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments.
  • GPU Infrastructure & AI Services Management
  • Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
  • Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).
  • Observability & Documentation
  • Maintain detailed operational documentation, runbooks, and troubleshooting guides.
  • Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.
  • Process Improvement & Collaboration
  • Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services.
  • Contribute to CI/CD pipelines, automation, service, and security best practices.
  • Track and communicate work through task management platforms (ServiceNow and Jira).



Required Skills & Experience

  • Hybrid Cloud In-depth knowledge of private (on-premises) and public (GCP & AWS) cloud architectures and services.
  • AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.
  • AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).
  • Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes.
  • Technical Support & Troubleshooting Proven ability to diagnose and resolve customer and platform issues in production environments.
  • Strong Communication & Documentation Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams.
  • Time Management & Accountability Ability to work independently, prioritize tasks, and manage workload effectively.



Preferred Qualifications

  • Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.
  • Exposure to AI coding assistants like Codeium, Copilot, or Tabnine.
  • Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.
  • About the Team & Reporting Structure
  • These positions will report to the Senior AI Architect and work as peers within a specialized AI support team. Collaboration with internal VM and container support teams as well as NVIDIA, Codeium, and other vendor specialists will be essential for supporting customers, troubleshooting, and optimizing AI workloads.

Job Tags

Contract work, Remote work,

Similar Jobs

XPO Logistics

Forklift Operator - Part-Time - Night Shift Job at XPO Logistics

 ...What youll need to succeed as a Forklift Operator at XPO Minimum qualifications: Be at least 18 years of age Able to do basic math calculations, with and without a calculator Available to work a variety of shifts due to varying freight volumes, including days... 

Hogsalt NYC

Executive Chef Job at Hogsalt NYC

 ...otros mientras realiza mltiples tareas y prioriza las cargas de trabajo. Debe creer en la calidad manteniendo constantemente los...  ...el rol de principal punto de contacto para BOH en la tienda, disponible en todo momento para la comunicacin con el Director de Equipo... 

WealthyFortunnne

ASAP, 2025// Training center teachers needed in Dalian Job at WealthyFortunnne

1. Age of students: 5-12 years old, with 10 students teaching in small classes. 2. Working hours: 8 hours a day, 5 days a week

BJ's Wholesale Club

Senior Application Developer - UiPath Job at BJ's Wholesale Club

 ...UiPath ReFramework and best practices. A solid understanding of RPA concepts and methodologies is needed. Proficiency in VB.NET...  ...) is important. Responsibilities: I need you to design, develop, and implement sophisticated RPA solutions using UiPath. Analyze... 

NANA Regional Corporation

Aircraft Maintenance Mechanic - Qavvik Air - Fairbanks, AK Job at NANA Regional Corporation

**Job Description**As an Aviation Maintenance Mechanic, you will be responsible for inspecting, maintaining, and repairing aircraft systems and components to ensure their safe and efficient operation. Your duties will include conducting routine inspections, troubleshooting...