Subscribe to the Kognetiks Chatbot for WordPress Substack Click Here
X
Lonely vessel seeks horizon, finds storm.

Enhancing the Kognetiks Chatbot: New Frontiers in Image and Speech Generation

Over the past several weeks I have been working on extending support for image and speech generation for the Kognetiks Chatbot for WordPress in Version 1.9.5.  While enhancing the chatbot’s capabilities within these two areas, other minor improvements have been made along the way.

Chatbot, Image Generate, Speech Generation

I started by asking the chatbot to write a six-word short story which produced the following result:

Kognetiks Chatbot - Prompt - Lonely Vessel
Kognetiks Chatbot – Prompt – Lonely Vessel

Lonely vessel seeks horizon, finds storm.

The shortcode I used for this example is the most basic shortcode you can use to call the chatbot.

[chatbot style=embedded]

Kognetiks Chatbot - Prompt Generation Short Code
Kognetiks Chatbot – Prompt Generation Short Code

Next, I asked the chatbot to generate an image for my short story which produced the following result:

Kognetiks Chatbot - Image - Lonely Vessel
Kognetiks Chatbot – Image – Lonely Vessel

On a different page, I added image generation capabilities using the additional parameter in the shortcode to specify the model to use.  In this case I’m using dall-e-3 model.

[chatbot style=embedded model=dall-e-3]

Kognetiks Chatbot - Image Generation Short Code
Kognetiks Chatbot – Image Generation Short Code

And lastly, I asked the chatbot to convert my short story from text to speech

which produced the following result:

And finally, on a different page, I added speech generation capabilities by including both a speech model and selecting a voice.  In this case I’m using the tts-1-1106 model and the voice of Fable.

[chatbot style=embedded model=tts-1-1106 voice=fable]

Kognetiks Chatbot - Speech Generation Short Code
Kognetiks Chatbot – Speech Generation Short Code

Model Settings

The model setting can be found on the API/Model tab in the Chatbot Settings.  These settings have been broken out into several sections including: API Settings, Chat Settings, Voice Settings, Image Settings, and Advance API Settings.

Chat Settings

The Chat Settings allow you to specify the default model, maximum tokens, and conversation context.  You may have already noticed that the list of available models has been expanded to include any of the GPT series models supported by OpenAI.  This includes the latest model “gpt-4-turbo-2024-04-04” that was just released.

Kognetiks Chatbot - Updated Model Settings
Kognetiks Chatbot – Updated Model Settings

Voice Settings

The Voice Settings allow you to specify the default speech model, the voice to be used and the output format.  Speech models include: tts-1, tts-1-1106, tts-1-hd, and tts-1-hd-1106.  Voice options include Alloy, Echo, Fable, Onyx, Nova, and Shimmer.   Output formats include MP3, OPUS, AAC, FLAC, WAV, and PCM.

Kognetiks Chatbot - Updated Voice Settings
Kognetiks Chatbot – Updated Voice Settings

Image Settings

The image settings allow you to specify the default image model, output format, size, quantity, quality, and style.  Image dall-e-2 and dall-e-3.  Output format is currently limited to PNG with other formats available when they are released.  Image sizes are limited based on the default image model selected, for example dall-e-3 starts with 1024×1024 and includes 1792×1024 and 1024×1792.   If using the dall-e-2 image model, you can generate 1 to 10 images, however with the dall-e-3 model you can generate only one image at a time.  Style output – either natural or vivid – is only an option for the dall-e-3 model.

Kognetiks Chatbot - Updated Image Settings
Kognetiks Chatbot – Updated Image Settings

The settings will continue to evolve as options become available via the API.

Controls Moved – Read Aloud

You may have already noticed that I’ve moved the controls for send, trashcan, file upload, and read aloud to below the message input.

Kognetiks Chatbot- Chat Controls
Kognetiks Chatbot- Chat Controls

A newly introduced feature is “Read Aloud” (the last icon on the right) which will read the last entry in one of several voices and generate an MP3 file which can be downloaded and saved locally.  In addition to auto-playing the text-to-speech that was generated, there is also a link to “Listen”.  Testing on an iPad and an iPhone restrict auto-play, hence the inclusion of the link to the MP3.

Thank You to Contributors

The evolution of the Kognetiks Chatbot for WordPress plugin is driven by the rapidly growing installed base of users, which I now estimate is close to 2,000 active installations.  I want to thank each of you who have taken the time share your valuable feedback.  Keep it coming as it helps me continually improve and provide a better experience for all our users.

#ChatGPT #WordPress #WordPressPlugin

About the Author

Stephen Howell is a multifaceted expert with a wealth of experience in technology, business management, and development. He is the innovative mind behind the cutting-edge AI powered Kognetiks Chatbot for WordPress plugin. Utilizing the robust capabilities of OpenAI's API, this conversational chatbot can dramatically enhance your website's user engagement. Visit Kognetiks Chatbot for WordPress to explore how to elevate your visitors' experience, and stay connected with his latest advancements and offerings in the WordPress community.

Stephen Howell
Stephen Howell is a multifaceted expert with a wealth of experience in technology, business management, and development. He is the innovative mind behind the cutting-edge AI powered Kognetiks Chatbot for WordPress plugin. Utilizing the robust capabilities of OpenAI's API, this conversational chatbot can dramatically enhance your website's user engagement. Visit Kognetiks Chatbot for WordPress to explore how to elevate your visitors' experience, and stay connected with his latest advancements and offerings in the WordPress community.
Posts created 84

Leave a Reply

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top