- Stage 1 :: Projection Matrix Alignment between Vision Encoder & Pretrained LLM on CC-3M-595K (Custom) - Stage 2 :: Projection & LLM Finetuning on LLaVa v1.5 Instruct (including various ...
Multi-Speaker Hindi TTS with VITS A production-quality multi-speaker Hindi Text-to-Speech system built on the VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) ...