This year I focused on learning as much as I could in areas related to my current job – programming, that is. More specifically, I wanted to ramp up my skills in blockchain and artificial intelligence. As some of you probably know already, I have a constant activity in a few blockchain projects out there, so I’m relatively up to date with the latest developments. Not something that I could tell about my AI expertise, though, at least at the beginning of the year.
Fast forward 9 months, and I have a few artificial intelligence courses under my belt (for those of you interested in certifications, you can see them on my LinkedIn profile). Although it was quite tough for me, as it touched on a few areas where my knowledge was sparse (like calculus and advanced mathematics), I pushed through and I’m happy to report that the experience was more then rewarding. I thoroughly enjoyed learning about machine learning, deep learning and artificial intelligence in general and I can tell now that my expertise is on par with – if not better than – my blockchain knowledge.
But learning something without applying it is futile, to say the least. So, after building a small project for time series (predicting crypto prices based on eclectic parameters) I decided to turn to an area of artificial intelligence closer to this blog, namely text processing.
The Experiment: Using Artificial Intelligence To Generate Blog Posts
I’m sure you already heard these claims that AI bots are writing comments on social media, or even generating news stories. To make things clear, we’re quite far from a semantically correct way of generating texts using artificial intelligence, but these projects can indeed do something that may be useful – or at least interesting.
For those tech savvy, I put all the technical details in the last paragraph, The Technical Box, so feel free to look over it if you want to know more details.
Let’s move forward.
For the purpose of this experiments I extracted all the articles on DragosRoua.com (907 posts, 1, 210,085 words), including some markup (mostly added by plugins that I used at some point in the past).
This was the training corpus. I instantiated a model (again, details in the technical box), and I started to train it using two different parameters: character generation and word generation.
In the first case, after training, the model will generate text one character at a time (including spaces, punctuation marks or markup). In the second case, the model will generate text one word at a time (in which case all the words are made lowercase first, so I needed to edit it a little).
Without further ado, let’s see some generated text.
It’s a very good thing to do it and think about that time in the middle of the contrast was a few posts and sometimes started to find out of the self-esteem and the same sensation of my blog posts in the first wordpress plugin. I will be experiencing a little bit of the results. And the social structure of the same things are stopping to the satisfaction of the month. And the transcontine country, the most surprising activity for exact time as a successful business setup involved in the morning process is a good vibration of the content in the next one.
Doesn’t really makes any sensible sense, so to speak, but it is a piece of text that has the “color”, the “vibration”, the topical coverage of what I write about in my blog.
It has words like
- wordpress plugin
- business setup
- social structures
And so on. These are topics that I wrote about extensively.
But it also have words that doesn’t exist, like “transcontine”. To be honest, I find that word fascinating, in a good way. It may be an English word, if enough people will start using it.
< h3 > The 80 / rules of yourself < / h2 >
If you ask me, I may not even notice the pictures of the ball of going to realize that the other kids were shouting at me , and I was still , because of the pain I filled with the rhythm, I had to make some sort of new friends. It’s the same with the rules. The problem is that you have to take who you want to tell. And, at some point, I had to face the feeling of the revealing up and then everything was done. The big picture was somewhere in the middle of nowhere, but the fact that I value creation a lot from time to time .
The above text is slightly edited, as in I had to make the beginning of sentences uppercase and eliminate some white spaces in punctuation. But other than that, it’s just as it came out.
As you can see, in this generation model, the sentences are almost making sense. I especially like this one: “And, at some point, I had to face the feeling of the revealing up and then everything was done.”. It has a certain meaning to it, actually.
Also, this type of generation has a lower frequency of key words that I write about, but a more robust syntax and a more plausible semantic value.
The generator also outputs markup, like
<h3>, but it doesn’t close it correctly. I suspect this is because of insufficient training, from what I read in the documentation of the model, with sufficient training the generated text will be syntactically correct not only for markup, but even for LaTex or XML.
Conclusions And Potential Use Cases
The first and most puzzling thing that came to my mind was: “This is the end of AdSense”. I am old enough to remember what MFA means: Made For Adsense. 10 years ago, some people made a lot of money with sites that didn’t have any use whatsoever for a normal person (like really, they weren’t making any sense, just piles of text with keywords put together) but which fooled AdSense into serving ads for them.
With these artificial intelligence based text generation tools, I can easily think of a “MFA on steroids” wave of sites, that will “seem” like they are making sense, but instead they are just generated piles of texts on a heavily trained model, having the “color”, the “vibration” and the topical relevance of a legit site. Way more complex and harder to identify than just a blatant copy and paste old-style MFA.
It’s true, things have changed since the first MFA phenomenon and Google AdSense policies have changed – for the better. I’m sure they include now some AI too in order to understand if a site is legit or not. But even in this case, there will still be a peripheral zone in which shadier ad agencies will heavily make use of this technology.
For the record, I want to be wrong about this, I really do.
The second thing that came to my mind was the question: “how can we actually make use of this, in a genuine way?”.
And here are my thoughts about it:
1. Assess The “Style” Of Your Blog
As you saw, the generated texts have in both cases (char generation and word generation) the vocabulary and the “style” of the original text. I find this extremely useful because it gives a window into understanding how the blog “really sounds like”, without touching the semantics. It’s like you get to hear the “tune” of your blog, without the lyrics.
As you keep writing, something will start to emerge as personal from your posts. Something that will make the writing on that blog “you”. Reading from time to time some AI generated text trained on your own writings will give useful insights. Is this how you’d like your blog to “sound”? Is the “tune” harmonious or do you touch too many different notes at once? Are you pleasantly touched by the feelings you get reading that text, or repelled?
2. Find Inspiration
The second way in which this can be genuinely useful is to find inspiration. I don’t suffer from this problem, but I’m fortunate, because I chose to write about whatever I want, whenever I want. I am very well aware that other blogs out there are trying to find a niche and stick to it, and, in this process, finding new topics to write about can be a tedious task.
I truly think that these texts are holding a lot of hidden gems. The apparently random association of words, the skewed syntax and the weird “patterns” will certainly generate some inspiring sequence, will trigger some “a-ha” moments, or will just make you think about a bunch of “what ifs?”. And this is the best thing you get after brainstorming, which, we all know, it’s kinda difficult to perform with only one brain involved. Yours, that is.
Last, but not least, I confess that I’m having loads of fun just playing with this tool. There are some sentences that are just plain hilarious. Like this one, for instance:
I don’t know if you are serious about my blog, I perfect that I am from a much more than 1.000 usd .
Or this one:
I don’t know that more than 100 ways to sell your personal head.
Or obviously, this one:
I wash the same time with a fantastic person between the head of the topic of people consistent in that situation.
So, I think this could evolve into some sort of a fringe entertainment area, where people will search for funny texts with the “taste” of reddit, or Steve Pavlina, or, why not, Dragos Roua.
The Technical Box
The model I used is textgenrnn, (which I suspect comes from “text generation using recursive neuronal networks”) by Max Woolf. It’s a Python implementation of the char-nn model, written by Andrej Karpathy (currently head of AI at Tesla). For those interested in reading more, here’s a link to the original blog post of Andrej Karpathy.
I used Google Collaborative for training, with the following parameters:
- Character generation:
– 3 layers, 128 cells LSTMs, 7 training epochs, loss 0.92
- Text generation:
– 4 layers, 128 cells LSTMs, 10 training epochs, loss 2.26
I kept the generated weights for both models and I intend to use them again, for some other projects. If you do this, I suggest to do that too, as the training time can be relevant (between 1-2 hours and 4-5 hours, on a more serious corpus of texts, and some serious parameters).