Qso.ai’s Roadmap: From Clips to Autonomous Social (Full Transcript)

Vedant shares how Qso.ai uses accurate transcription to turn long videos into publish-ready social content—and why SaaS is shifting to done-for-you AI.
Download Transcript (DOCX)
Speakers
add Add new speaker

[00:00:00] Speaker 1: Hi everyone, it's Matt here from Assembly. I'm joined here today by Vedant from Qso. Vedant, thank you so much for joining us today. We just wanted to ask some questions to Vedant, get him to share about Qso, his company, and hopefully we can dig into how he's building with Voice AI and how Assembly AI has been a partner for him, but also we'll get to see his product, see what he's building, the problem that he's solving, and learn more about him. Vedant, thanks for joining. Maybe you could start with a short self-intro about yourself and your company.

[00:00:33] Speaker 2: Hey Matt, I'm Vedant and thanks so much for having me here. I'm Vedant, I'm one of the founders of Qso.ai. Our software allows people who have absolutely no social media expertise to show up on social media every single day without burning out or without having to think about the nitty-gritties of what it will take to have a great social presence. We've built out a software that's a combination of an AI-powered video editor, a social media manager, and multiple pieces that anybody would need to show up on social media and put it all together in one place for people. So far, over the last four years, we've had over four million people who've signed up and used our software, and we have a large customer base spread across pretty much every country in the world.

[00:01:26] Speaker 1: Very nice. And maybe you could kind of start by telling me the story of Qso. I know it used to be called video, but how did the idea come about and what was the problem that you were really trying to solve?

[00:01:37] Speaker 2: Yeah, the problem was very simple. I had spent six years working as a social media manager for a media company in India, and the biggest challenge that we used to face was just the amount of time and effort and human resources we needed to just maintain presence on social media. Each post that we were creating had to go through this elaborate process that involved multiple people, and that meant turnaround times were really high. The bottleneck, especially in all cases, was that, hey, you needed someone who understood how a complex software like a video editor works in order to create any content that people are consuming. And people are obviously consuming a lot of video. This remains true since 10 years. This was a bottleneck for me. It was one of the biggest challenges in my job and my career. And so, after working with that company for four years, I was like, hey, I got to do something about this and I got to give it a shot of solving this problem by myself with my understanding of the company. And that's how the software was born. Yes, it was previously called Video.ai, but then we changed the name to Qsuite.ai to truly reflect what we're trying to do, which is quick social.

[00:02:54] Speaker 1: Nice. Oh, now I understand. Quick social. Okay, cool. Very nice. And what was the first version like? I know, I guess, four years ago now, speech attacks has come a long way. And so, was the first version of video much simpler? Like, are there features that you've been able to build now that speech attacks has gotten so accurate? Like, what was the core feature set at launch like?

[00:03:20] Speaker 2: Yeah, of course, software has evolved massively in the last four years. When we started, what the software world looked like at that point of time, there was no AI. We launched pre-AI. So, we launched the AI product before chat GPT. So, at that point of time, AI wasn't as cool as it is today, right? And right after us launching, we launched with a very simple workflow where you could upload a long video and you could get a bunch of short videos from it. That's it. Like, there was no social media. There was no captioning. It was just one long video that you upload. It takes you through a linear process and you get like multiple short clips that you can choose from and post to your social media. So, it instantly takes like one long video and creates like multiple assets from it. And at that point of time, if you remember, Reels had Reels and short videos that just started taking off. So, our timing was right over there. In terms of speech-to-text also, yeah, the market has evolved massively in terms of what was the status quo at that point of time, what's available in the market today. But yeah, even the very first version of our software did have a speech-to-text and there was no GPT, like I said. So, there were manual pipelines that we had built to take that, the speech-to-text data and then convert that and, you know, apply intelligence to it without chat GPT to make short clips and then go from there.

[00:04:39] Speaker 1: That's, I remember one of the most interesting workflows your team was trying to work on was like lip syncing to see who was speaking and then like panning the camera accordingly. Like you would, like we would produce the speaker labels and then you would run like facial recognition and like clip according to who our speaker labels said was speaking. Absolutely. And like, three years ago, that was wild. But now we have like speaker identification built into the API. So, I imagine like, you know, the product has changed a lot, like just over time, just as technology has gotten much better. And it's kind of interesting to look back on how we used to do things, like our old methods and our old paradigms and how I guess technology has simplified so much in such a short span of time. Absolutely. That's true.

[00:05:31] Speaker 2: And that part of time, I also remember that we had to, like there was auto-detection of language, but it just wasn't like as good as it is today. And today we have one model for so many languages. That was not the case that time. So, you had to specify, hey, this is the language and then provide a little bit of, you know, and if that didn't happen, everything got messed up and, you know, it's evolved massively since then. Absolutely.

[00:05:56] Speaker 1: Yeah. And I guess on that topic, when did like transcription quality kind of start to make the product? Like, was there a specific moment where you realized this technology is like, it's real, it's really working, this product is going to work?

[00:06:10] Speaker 2: I mean, for a software like ours, for a service like ours, where the end product is derived from the transcription, transcription quality is extremely important because not only will that be used in the intelligence and the processing of what the output looks like, but it actually shows up on the output as captions, right? You actually, the words that have been spoken are burnt onto the videos as captions so that people can follow along. So, from the start, accuracy was really important for our use case and there was absolutely no going around that. And obviously, like every other team would, we tested out a bunch of stuff, right, that was available in the market at that point of time, including, you know, the big tech models, speech-to-text models that were available, which is the most obvious choice. When you're thinking speech-to-text and most of your infrastructure is already on GCP or AWS, that's what you're thinking in terms of adding more services as well, right? So, we did our test runs with a bunch of those services. They were never up to the mark in terms of the accuracy and that became a core bottleneck. So, one of the things that we wanted to solve, like I said, like if there is no accurate subtitles, then the entire process for us breaks. So, it has to be the best quality from the start. And that's when we started looking out for different other vendors and sort of started comparing internally, hey, if we were to run the same video on five different services, which one is giving us the best results, et cetera, et cetera. And that's how we ultimately came down and settled on Assembly AI.

[00:07:48] Speaker 1: Nice. And maybe you can also kind of give us a bit of a sneak peek. What are you building on now? What is like you're so thinking about on the roadmap?

[00:07:57] Speaker 2: I mean, for us, like I said, like software has evolved massively over the last four years and today, SaaS tools in general are becoming more about not, you know, what features you have, but what job you can get done for your customer, right? And that's essentially where we are at, where the current version of our software or what we had been building so far was about, hey, how can we give you as much content as we possibly can repurpose from your stuff and then you make the decision of posting it to now completing the entire lifecycle where we're like, hey, when you upload something or when you give something to us, we will have agents or the system that autonomously schedules and creates social media content for you and publishers it on your behalf as well. So the entire loop is completed as compared to you intervening inside the loop, which used to happen all the time with any SaaS tool where you had to click and get the work done. The tool offers some other arbitrage. I think that's completely shifting to a done for you autonomous loop completion.

[00:09:07] Speaker 1: Yeah, it does kind of strike some thoughts for me as like, it's no longer about just one workflow or like this tool being one thing that I use to get my whole job done. Like I want something that does everything from end to end. Like I'm thinking of like open code or cloud code, like one tool, one place, like that's my workstation for this particular slice of work. I guess I'm guessing QSO would be something similar for social media marketers. Even for someone like myself, a technical product marketing manager, I can just upload everything that I've got in long form and I get lots of little shots out that I can put on X, put on LinkedIn, put on YouTube Reels and on Instagram Reels.

[00:09:54] Speaker 2: We don't want to stop at just the videos. What we understood is that, you know, there's a large part of the content that you're creating that is not videos. For example, this interview, we could create short videos out of it, but you could also take like a one liner quotation or a testimonial and put that out as a photo. And at the same time, you could also make an infographic of how the QSO.ai team decided to start using Assembly and what they used. Those are all things that creating these assets would have taken hours and it would have never even come to your mind pre-AI and post-AI. If you're able to capture this conversation, then you're able to repurpose it for any kind of content, like even newsletter, blog posts, you name it. So I think we want to build those entire workflows for you. So all you have to do is, you know, just record it, your Zoom call or your Riverside for that matter, drop it on our software and we create or we repurpose that into social media content for the next 30 days.

[00:10:54] Speaker 1: It kind of reminds me, like, it's cool that a lot of the work being done is over these conversations and we're just capturing people talking, like natural, unscripted, honestly, not even that prepared, but like there's quality and there's like an exchange of value in these conversations that shots are able to capture and videos are able to capture really well. Yeah. Well, thanks for answering those questions. All I really had next was if you want to give some advice for founders, project managers, maybe even social media managers watching who want to use AI, who want to use AI-powered video and audio tools like QSO, if you had anything to say to them, you know, what would you say?

[00:11:36] Speaker 2: There's never been a better time to innovate on either using AI to get your work done or using AI to be seen more. I feel that no matter what your goal is in today's day and age, when you're building product or when you're shipping product or when you're marketing product, I think AI is there at every part of the workflow and there's a new AI tool that comes up every day. I think there are probably like 50 AI tools lost every single day. What that means is, and what that means for product managers and product builders is that software as a mode is collapsing. Anyone can, any guy with a computer and a cloud subscription can now build a software. And that means that building softwares or seeing your ideas through into actual real work is not going to be a barrier anymore. It's the same thing that happened to content creation when everybody got an iPhone, a camera phone, right? Everybody could create and see how that's changed the entire life. The same thing is going to happen for software and using AI in the next three to five years. And it's so easy to build today that that's not going to stop you from trying something and, you know, experimenting and putting something out there that's worth noticing. And even if it is changing the lives of a very small set of people, it's still valid, right? And those equations or the power laws of software don't hold true anymore. So I think everyone should go and try creating something of themselves, like using AI as fast as possible, using voice AI for that matter, as fast as possible.

[00:13:08] Speaker 1: I keep seeing on X that like SAS is dead. And I think honestly, like these apps, like especially the application layer is just having such a great time because adoption is increasing. People are more open to trying AI. Not everyone is going to, you know, vibe code a QSO over the weekend, but rather they're actually more open to trying these things out and seeing the real value that comes from using these tools.

[00:13:34] Speaker 2: Yeah, I agree. I don't think, I think the overall market is going to change. And when people say SAS is dead, they mean SAS in the traditional sense, in what it used to be and the way it was sold is dead because anyone else can create. So now you're really, the pricing model was always by per seat, not per unit of value. And that's changed completely because there can be someone else who comes in just as easily and starts charging per unit value and then your entire model falls flat. So I think that's definitely changing, but it's definitely expanding the market. It's great news for all of us in the business because more and more people want to use these tools.

[00:14:10] Speaker 1: Yeah, great. That's all I really had. All I wanted to do next was to see QSO in action. Maybe you could show me around the dashboard, show me how to create these snippets and we can learn more about your product that way.

[00:14:27] Speaker 3: Yeah, so this is what the software looks like.

[00:14:31] Speaker 2: What you can do is that you can, this is, by the way, this is going to change in the next couple of months, but I'm just going to show you how this works. You can get any video links from YouTube, Instagram, Facebook, drop it from your computer, etc. And then you can ask the software to do a bunch of things on those videos. So they can, you can ask it to create short videos. You can ask it to add captions. You can ask it to write stuff around the video and you can also use some other features. But I'm just going to show you an example of what it does. So this is an example of a video that I got on YouTube. And what it's done is that it's created a bunch of short videos from this video. And it's given each of the short video scores. So you see that there's a score. So this, this video scored a 97. So it's likely to do very well. Now, see what it's done to the video. It's automatically framed, understood every frame of the video and automatically framed.

[00:15:21] Speaker 4: This one is the most technical tool out of the list here, which is N8n. This is an AI orchestration tool. What does that mean? It allows you to link all of these different AI.

[00:15:30] Speaker 2: So it's automatically reframed at every scene that we're talking about today. When it detects that there is somebody else talking about something, then it'll automatically sort of change.

[00:15:41] Speaker 4: The entire layout, like in this case, 500 AI tools to discover the top nine to blow up your business, help you make money. You've got no money.

[00:15:50] Speaker 2: And now if you want to change stuff, so it's created all of these short clips. And all of these short clips, any are likely very, very good to be posted directly online. In that case, you can just click on this button that says share, link your social media accounts and post this video directly on social media. Or if you want to change some stuff up, you have a powerful editor that allows you to change things up inside the editor. If you want to change the style, cut up some parts, remove stuff. All of those typical video editing options are also available over here. So connect your social media, drop a video. And you can also use this nifty little tool called Viddy, which is like chat GPT for your video. So you can ask Viddy to create timestamps, show notes, summaries, quotes, titles. So let's say if I ask it to generate a summary of this video, it will go through the entire video and create like a summary of this video. And if I wanted to do, you know, other stuff around the video, create other content like SEO blog posts, letters, stuff like that, we'll do all of that as well. This is a video that Jude shared with me.

[00:17:00] Speaker 3: So here you go. So it's automatically.

[00:17:05] Speaker 1: What we think is super cool about this is, you know, because you have the control of prompting for different use cases, this may be good and this may be bad. Right.

[00:17:14] Speaker 2: If I was to know this, the layout that it automatically assigned because it noticed that Ryan was speaking on one corner. And so it gave Ryan this. So now I can change this. If I want to change the layout a little bit, if I want to make like Ryan bigger, then I can do something like this.

[00:17:34] Speaker 3: And I can change this. Yeah. And then I can crop this so I can ensure it is in the frame here.

[00:17:48] Speaker 2: So I can do that. So this is this is what we've been up to. This is going to be live in a month or so. Sorry, in a week, actually. So now I can even I don't want to talk about I just want this part. Right. So. So it's automatically done all of this work for you as well.

[00:18:08] Speaker 1: Vidi kind of reminded me of like uploading shots on our YouTube channel. And to be honest, I never like it's kind of like a game of finding the perfect caption and the perfect description. And because you guys have all the experience in like social media, obviously, I'm no expert. You guys are the expert. You're able to transfer that expertise to the user because you guys already know what works. You guys have the cheat codes. So you just share that with us and I get the benefit from all of that knowledge. That's great.

[00:18:41] Speaker 2: Yeah, absolutely.

[00:18:43] Speaker 1: That was really exciting. Thanks for your demo. It was like great to see what you guys are building. It's it's it's amazing like what AI is able to do these days. I'm quite fascinated by the workflows, by the platform. And I'm also really glad that I got to hear your story in building QSO. And I guess how you evaluate your speech to text provider, why it was important. And overall, I really enjoyed this conversation. I'm really glad that you took the time to share this with us today, Vedant.

[00:19:15] Speaker 2: Of course. Thank you so much.

[00:19:17] Speaker 1: Thanks for your time.

[00:19:17] Speaker 3: See ya.

[00:19:18] Speaker 1: See you.

ai AI Insights
Arow Summary
Matt from Assembly interviews Vedant, co-founder of Qso.ai (formerly Video.ai), a platform that helps people with little social media expertise consistently create and publish social content without burnout. Vedant explains the origin story: his experience as a social media manager revealed video editing and turnaround time as key bottlenecks. The initial product repurposed long videos into multiple short clips using early speech-to-text and custom pipelines before the recent AI boom. Transcription accuracy is critical because it drives both clip selection and on-video captions; after testing major cloud providers, Qso chose AssemblyAI for better accuracy. Qso’s roadmap shifts from feature-centric tools to an autonomous, end-to-end “done-for-you” workflow that generates varied assets (short clips, quote cards, infographics, newsletters/blogs) and schedules/publishes across platforms. In a demo, Vedant shows automatic clip generation with performance scores, auto-reframing/speaker-aware layouts, an in-browser editor, social publishing, and “Viddy,” a ChatGPT-like assistant for video that produces summaries, timestamps, titles, and more. The conversation ends with advice: AI is making software creation easier, changing SaaS pricing/value models, expanding the market, and founders should experiment and build quickly—especially with voice AI.
Arow Title
How Qso.ai Uses Voice AI to Automate Social Content Creation
Arow Keywords
Qso.ai Remove
AssemblyAI Remove
speech-to-text Remove
transcription accuracy Remove
video repurposing Remove
short-form video Remove
captions Remove
auto-reframing Remove
speaker detection Remove
social media automation Remove
AI video editor Remove
autonomous agents Remove
SaaS evolution Remove
value-based pricing Remove
Viddy Remove
Arow Key Takeaways
  • Qso.ai helps non-experts maintain daily social media presence by repurposing long-form video into ready-to-post assets.
  • The product began pre-ChatGPT with a simple long-video-to-multiple-clips workflow and custom STT-based pipelines.
  • Transcription accuracy is foundational because it powers both content intelligence (clip selection) and user-visible captions.
  • After benchmarking multiple providers, Qso selected AssemblyAI due to superior accuracy for their use case.
  • Qso is moving from tool-based workflows to an autonomous, end-to-end system that creates, schedules, and publishes content.
  • Beyond video clips, Qso aims to generate multi-format assets like quote cards, infographics, newsletters, and blog posts from the same source content.
  • The demo showcased scored clip suggestions, automatic reframing based on who’s speaking, a built-in editor, direct social publishing, and a video assistant (“Viddy”) for summaries/timestamps/titles.
  • AI is lowering barriers to software creation, pressuring traditional per-seat SaaS models and pushing toward value-based pricing and ‘done-for-you’ experiences.
Arow Sentiments
Positive: Optimistic, forward-looking discussion highlighting product progress, strong partnership value from accurate transcription, excitement about AI-enabled workflows, and constructive views on how SaaS is evolving.
Arow Enter your query
{{ secondsToHumanTime(time) }}
Back
Forward
{{ Math.round(speed * 100) / 100 }}x
{{ secondsToHumanTime(duration) }}
close
New speaker
Add speaker
close
Edit speaker
Save changes
close
Share Transcript