Generative AI

Interact with any media using video intelligence - a Codemonk showcase

Discover InScene Video Intelligence by Codemonk – groundbreaking AI tech for seamless media interaction, transforming video engagement and enhancing user experience.

Karan Kariappa

Jul 4, 2024 • 6 min read

A lot has been said about the adoption of GenAI in the science and technological arena, which has left business leaders puzzled about the prospects of GenAI applicability within their ventures. For instance, products ChatGPT, Gemini, Character.ai, and such seldom help businesses address a variety of business-specific use cases. On the other hand, customised GenAI solutions, developed for internal business use cases, tend to be localised to specific environments. These two trends, alongside several subjective biases, collectively have raised concerns among business leaders about the possible scope of GenAI in improving their business outcomes.

Let me paint a picture for a clearer understanding. Although products built on ChatGPT and similar large language models (LLMs) echoed loudly in the last 2-3 years, no single solution has grabbed the spotlight for redefining the way businesses function. And, for very good reasons. Considering instances where GenAI solutions managed to achieve something of this nature, companies tend to keep such Intellectual Property (IP) for themselves, using them as a key differentiator that sets their product/service channels apart from the competition. Regardless of the circumstance, realising tangible value through GenAI remains the ambition of the brave, as it requires substantial man-hours and resource allocation.

In the intersection of these two worlds—open-source GenAI & business-specific generative intelligence—is a unique showcase of capabilities, which inspires pioneering thought leaders to come to the table and demonstrate their potential to achieve a multitude of use cases demanded by the enterprise world. In the last 6 years, Codemonk firmly cemented its seat at this table of innovation, enriching its pipeline of product engineering capabilities in the Generative Artificial Intelligence (GenAI) domain.

In the most recent showcase, Codemonk’s InScene intelligence has caught the attention of several thought leaders, prompting a deep dive into specific use cases that Codemonk can help organisations achieve, without necessarily expending abundant organisational resources.

Rise of Video Intelligence

Having observed several businesses double down on content curation tools, Codemonk realised the significance of automating learning management workflows, wherein redundant tasks such as transcription and meeting summarisation consumed substantial resources when done manually. For instance, a skilled person had to be dedicated to the task of transcribing an audio or video, for the highest levels of refinement, before it could be fed into a learning management system (LMS) to obtain actionable information, to, lets say, improve sales training processes where a feedback loop is introduced to gather feedback and take corrective actions. Evidently, this simple task required substantial manpower. Codemonk cashed in on this opportunity to develop an InScene intelligence solution, which could automate the generation of meaningful insights to overhaul operational efficiency.

Simply put, Codemonk devised a video intelligence framework, which could be fed with one or many media (text, audio, video) for analysis and, in turn, obtain actionable responses. The engineering think tank demonstrated this capability, first, through an integrated chatbot, which would respond to user queries, once fed with the relevant data sets.

Opening a dialogue with the video itself, the intelligence framework could recognise the contents of the video and suitably respond with answers relevant to the context provided. At the culmination of these efforts, Codemonk’s InScene video Intelligence was born. An intelligence that could recognise In-Scene events of a video.

One of the first proofs of concept showcased by Codemonk using InScene video Intelligence was the Automated AI-commentary Generator, achieving the below use cases:

When fed with a sporting video such as a Cricket Match footage, Codemonk’s InScene video intelligence could understand the actions performed on screen as the video played out, producing text-based commentary for events occurring during playback.

Codemonk could produce engaging, statistical, and contextual commentary for Cricket matches, thereby augmenting a text-to-speech engine to produce life-like commentary in an audio format

Mimic the commentary styles of Sporting legends such as Ravi Shastri in audio, overlayed onto the Cricket match footage

Watch the full video here:

AI Explorations, Episode 1: Live Cricket Commentary based on GenAI

Collectively, Codemonk’s first experimentation with InScene intelligence produced an Automated AI-generated live commentary engine for any recorded Cricketing event.

The Video Intelligence Showcase

Upon validating the prowess of their technology, Codemonk proceeded to develop multiple use cases and applicabilities for video Intelligence. The first of which is providing interactive capabilities with media, training the algorithms to understand candidate technical interviews, which would, first summarise the interview process, and then, provide actionable insights into the candidature. The solution was eventually adopted by Zelevate, which built a technical candidate evaluation framework for recruitment processes around these capabilities. Instead of relying on inefficient/inaccurate 3rd-party transcription services, Zelevate’s candidate evaluations could be completely automated, providing the below key business outcomes:

Live Interview Transcription
Interview Summarisation
Actionable information for interviewers
Actionable playback - allowing users to interact with the video in Questions and Answers format

The above capabilities helped Zelevate completely automate its Interview-as-a-Service & Recruit Assist offering, wherein the company showcases one of the firsts of their kind solution—a video QnA tool—which could intelligently understand a video the way a human would. The video-intelligence based chat interface empowered employers/interviewers to look up highly nuanced, specific topics within a technical interview, such as “How good is the Candidate’s understanding of Apache?” or “Can the candidate use Hibernate function in Java?” or “Where was the Django framework discussed?” and many more contextualised queries that traditional chatbots fail to understand. Interestingly enough, when extended to a library of videos and media, video intelligence succeeds in accurately responding to queries contextualised across more than one file. The solution also generates chapters from long-form video content as well.

Codemonk’s InScene video intelligence picks up topics discussed between two people in a long-form conversation, wherein, users could simply ask the Chatbot a technical/physics question and suitably obtain responses.

For instance, if two people are talking about laws of motion, the Chatbot could provide contextual responses to questions like, “What is displacement, as described by John in the lecture?” The solution could recognise musical notes played within a video and identify its source. Recognising the validity of the solution, one of the largest music-producing studios in India approached Codemonk to monetise their audio libraries. The duo is finalising the scope and prospects of collaboration in an effort to improve the revenue-generating channels within the music, streaming, and licensing industry.

Another important use case within the musical industry is the metadata enrichment that could be automated using Codemonk’s InScene. For producers who store decades' worth of music data in the form of massive libraries, as reference points for the creation of new content, Codemonk’s InScene intelligence could essentially serve as a guide that helps producers pick out contextual information from massive repositories of raw data. In this case, multiple video libraries are analysed all at once to obtain InScene video Intelligence.

These capabilities are merely the tip of the iceberg in terms of Codemonk’s competencies in the GenAI space. The company is planning to roll out a series of experiments and test cases for multiple real-world requirements, showcasing the applicability of its technology, all the while collaborating with tech giants to bring out utilitarian products and services. More importantly, Codemonk is providing to business leaders a guided tour that explores its Generative Artificial Intelligence (GenAI) showcase, inviting them to try out new opportunities by which they can improve business outcomes and drive revenue across engagement channels.

Stay tuned for the next update from Codemonk’s think tank.

Interact with any media using video intelligence - a Codemonk showcase

Karan Kariappa

Rise of Video Intelligence

The Video Intelligence Showcase

Implementation of Private GPTs in supply chain management

MAVLink: The Lingua Franca of Drone Communication

Unseen Potential of Autonomous Drones in Estate Management - Precision Agriculture with Purpose-built Hardware