Inspiration
The surge in popularity of generative AI owes its ability to exceed expectations, showcasing remarkable capabilities in recent years. Whether through text-to-text or text-to-image models, these AI systems have transformed even the most basic human imaginings into stunning realities. This has motivated me to merge the strengths of both realms, text-to-text generative models and text-to-image models, in creating an application poised to elevate content generation to unprecedented heights.
What it does
My app asks the user to enter the name of the main character of the story, describe their physical appearance, enter the names of the side characters in the story, and a one-line plot of the story. Based on the information provided by the user, a short story is generated by the Claude model, which is then used by the Stable Diffusion XL model to generate comic images and arrange them in a comic-style layout.
How I built it
I built the app using AWS Partyrock, where I created four text input widgets, each one for the main character name, the main character's physical appearance, the side character names, and the plot of the story. Then I created a text generation widget, used the Claude model, and wrote a prompt to generate a short story based on the user input. Finally, I created an image generation widget, and used the Stable Diffusion XL model along with a comic book style preset. I prompted the image generation widget to generate multiple images based on the story generated by the text generation widget and arrange them in a comic-style layout.
Challenges we ran into
The challenge I faced was tuning the parameters in the text generation and image generation widgets. Increasing the parameter values in the text generation widget increases the randomness of the text generated, making them interesting. However, increasing the value after a certain limit causes the text to be so random that it may go out of context provided by the user in the plot. In the case of the image generation widget, increasing the parameter values makes the model strictly follow the prompt while generating the images, but at the same time compromises the image quality. So, the major challenge was to find a sweet spot between the possible parameter value range such that the story is interesting while keeping the context the same as provided by the user and not compromising the image quality too much while following the prompt for image generation. Apart from that, another challenge was to figure out the most optimized prompt for both the text generation and image generation widgets.
Accomplishments that we're proud of
I am proud of being able to create an app that combines the power of a text-to-text generative model as well as a text-to-image generative model to generate comic images based on a user's imagination.
What we learned
While creating this app, I learned about how generative AI models work, explored the theory behind these models as well, and gained experience by practically building this app. Apart from that, I also learned about prompt engineering and how even a small change in the prompt can change the model's output. I also learned how various model parameters change the model's output and how to find the optimized model parameter to use the model's creativity and not compromise the quality.
What's next for Tale Sketch
- Generate longer stories based on the user input, and generate images based on the story, stack the images generated to create a comic book. Integrate a subscription plan such that a user can pay on a monthly, quarterly, or annual basis and use the app to generate comics based on their imagination.
- Create animated images (not in comic layout), and stack them to create an animated video, generate dialogues based on the story generate separate audio for each user's dialogues, and sync the dialogues to create an animated video with audio.
Built With
- amazon-web-services
- partyrock
Log in or sign up for Devpost to join the conversation.