Accessibility-AI-and-Video

Description

The “Accessibility AI and Video” project aims to make educational content more accessible by converting lecture videos into book formats (e.g., PDFs). This approach benefits diverse learners by offering an alternative, interactive medium for studying, especially for those who prefer or require textual materials over video.

Objective

To improve accessibility and enhance learning experiences by transforming lecture videos into readable and interactive book formats that are accessible to all learners.

Methods and Technologies

  • Video Segmentation:
    • Developed a Python-based pipeline to segment lecture videos into chapters and extract representative images for each segment.
    • Utilized OCR, face detection, and AI models to automate the segmentation process.
  • Handling Slide Transitions and Animations:
    • Addressed challenges such as annotations, animations, and transitions using advanced open-source AI models (e.g., Ollama Vision).
    • Improved accuracy compared to traditional methods like pixel-based analysis and SVM.
  • Output Design:
    • Designed book outputs in accessible formats (e.g., PDFs) to ensure usability across diverse user groups.
    • Integrated representative images alongside textual content for an enhanced learning experience.

My Contributions

  • Research Leadership:
    • Led a team to investigate and implement scene-detection techniques for processing instructional videos, improving accessibility for students with different learning needs.
  • AI and Automation:
    • Developed and refined a CNN-based approach (utilizing VGG-16) to detect significant scene changes in lecture videos, reducing false positives in frame selection by analyzing Euclidean distance between consecutive frames.
    • Enhanced segmentation accuracy by integrating open-source AI models like Ollama Vision, handling complex video elements like incremental annotations and on-screen writing.
  • Performance Optimization:
    • Incorporated methods like Structural Similarity Index Measure (SSIM) and OCR to identify keyframes while reducing runtime using early-dropping techniques.
  • Output Design:
    • Converted segmented video data into accessible book formats (PDFs), featuring well-organized chapters with representative visuals, transcripts, and summaries.
  • Future-Oriented Development::
    • Collaborated to explore the potential of large multimodal AI models, including LLaVA 7B and PaliGemma, while strategizing improvements for next-generation scene-detection pipelines.

Key Skills Gained

  • AI Models (e.g., Ollama Vision, Convolutional Neural Network (CNN))
  • OCR and Video Processing
  • Accessibility Design
  • Research Collaboration and Leadership

Outcome

This project developed a robust pipeline for converting educational videos into structured, accessible books, benefiting learners with visual or auditory impairments and those who prefer text-based resources. It also laid the groundwork for creating interactive, engaging content formats like video summaries.

For more information, including the repository and implementation details, visit the project on GitHub.