Jordan Choo: Gonna be chatting with Kristin about AI and automation and marketing. And he did I miss anything, Kristen.
Kristin Tynski: Oh, now you got it all. Yeah, I'm excited to be here. Thank you guys for having me.
Jordan Choo: Yeah, no problem. So one thing that really caught my eye recently is the publishing of this marketing blog does not exist. I feel like you've been asked this question a whole lot.
Kristin Tynski: That was the point. I'm trying to make it to start a new conversation about these new technologies.
Jordan Choo: Nice. So could you walk us through like, you know, the purpose behind it the process? You know, creating it and all that?
Noah Learner: Yeah, sure. So maybe a year or two ago, a site called this face does not exist was was released. And that was, essentially refresh the page as many times as you want. And it'll show you a new face that was completely generated by again network, like a newspaper. I'm model for holding human pieces.
Jordan Choo: So now that we've dealt with the technical issues that we can never escape, as we all know, my initial question to you, Kristin, is with like, what was the whole inspiration and was a process behind the this is marketing blog does not exist?
Kristin Tynski: Yeah, so the inspiration was a blog that came out a few years ago, called this face does not exist, I think, or this, sorry, this person does not exist. And it used a state of the art image generating model to create essentially indistinguishable, the real looking human faces. So and that was, that was one of the first times that I I saw something from, like, this new class of AI that blew my mind and made me feel like, things are really changing quickly. And there are a lot of new and interesting implications for and some of them are, could be, you know, highly beneficial. And some of them could be, you know, scary and hurt industries, like content marketing. And so ever since I have seen that I it was just kind of stuck in my head. And then I, I've been following advances in natural language processing. And over the last last couple years, new network architectures have come along that have kind of been like, you know, the next step in the evolution of of text generation and other natural language processing tasks. And they're called transformer networks. And the first one that made big news is called GPT. To which was sort of released by I think it was released by OpenAPI. They didn't release the full version of it, they they released like a partial version, that that was not as powerful as the full version that they had used as the basis for their paper, and the example that text generation examples that they ended up showing the press, and that got them a lot of a lot of news,
Jordan Choo: right, I remember that one. They said it was like too powerful to to release, and they didn't lay their hands on.
Kristin Tynski: I don't know how much of that was marketing. But like, if you this was it, I think, in the beginning of this year that it came out. But the text generated passages were like an order of magnitude better than anything else that had been released. And so they were, they first released a model that was like, I think, a quarter the size of the one that they had generated this amazing text with. And then later on, they released one that was about half as powerful. And now I think there's one that's like three quarters as powerful. But in the meantime, between one that was released, and and today, a bunch of other companies released their own transformer networks, text generation and OP networks. And one of which was is called Grover, which was created by Roland Zellers, Alan AI. And this one was trained on a different data set, it was trained on a news data set, and was also a very large transformer model that was mirror equivalent to the unreleased GPT to model. At least that's my understanding of it, I could have that partially wrong. But it generate in tax that in my opinion, was significantly better than what the the partial models of GPT to that had been released could do. So they also released a web interface for it, which was really cool. So you can play with that, play with it without having to, like install their code libraries and try and run it yourself on your own on your own machine or virtual machine. I'm so that when I saw that, and I realized how good the text generation was. And it really made me want to do more with it. And so I approached Ron Zeller ism, he was offering to release the full the full model to people who wanted to play with it. So I got access to the full model. And we we set it up. And he also released his code base. So we set up the full model with this code base on our own servers. And then we use it to generate about 600 articles based on marketing content. So essentially, what I did was I had scraped a list of the most popular, like SEO, content marketing, social media marketing, you know, a few other sub areas of digital marketing. And then for each of those, I found the most popular articles based on like social sharing, so I just use buzz Sumo to pull those articles. And then I use those titles as the prompts for the grown ray is to create the pages essentially, or the text or the pages. And that was relatively straightforward. It was a little difficult to set up the code to work properly. But got it working and then generating about 600 articles, like I said, and just loaded them into a blog. Like that was honestly the simplest part just setting up a super easy WordPress install, and then, you know, adding a plugin to bulk upload articles. And that was basically it.
Jordan Choo: Well, well, that's that's, that is crazy.
Kristin Tynski: Yeah, yeah, I mean, it was just a really simple thing. And I, you know, can't really take too much credit, because all I really did was get the model and install it and run it and then take the output and put it into a blog. But I really wanted to do that, because I wanted to start a conversation about the implications of these new technologies, for content marketing for SEO. And then, you know, there are larger, maybe even more important implications having to do with, you know, the role of humans and creating text content, and really media in general in the future, and also issues around fake news and propaganda and troll armies of other countries trying to influence us. I mean, all of these things can be used for good and bad purposes. But for content marketer for an SEO like me, the conversation I really wanted to have was around what happens when these technologies become easier to use and more ubiquitous. And then you have a lot of gray hat black hat SEOs who leverage them to essentially build the internet with a ton of computer generated low quality content. What happens to Google in that case? And what happens to search in general, in that case, when, when potentially, there's more generally, you know, computer generated content on the internet than there is human generated content. So that was the impetus for that's why I wanted to do it and start a conversation around it. And it's, I think it's done that. But um, I mean, the real heroes or villains, however you want to look at it are the people who are creating these state of the art models.
Noah Learner: Right, right.
Jordan Choo: And, like, what do you see that future being in terms of, okay, you know, a bunch of gray hats and black hats release a whole ton of no computer generated text? Do you? Do you see Google being even able to differentiate that? Or even like a manual reviewers being able to differentiate that?
Kristin Tynski: Yeah. I mean, if you can, at this point, the next generation is getting so good that I mean, like, the most recent one, which I can talk about leaders, is a new one that Salesforce put out called control. And the output, so we're playing with it a little the last few days and output from it is, in my opinion, the best that I've seen. And I have a hard time, like if I, I had a rite of passage, and then I compared it to another passage that was human written. I don't think with like really greater than 50/50 accuracy, I'd be able to tell you, which was, which was, if that's the case, then yeah, then human reviewers don't do us a whole lot of good. Of course, I might be different, depending on the domain, like some, some things would be a lot easier to generate, you know, possible text for than others. Um, but yeah, I think there are probably a lot of scary implications for human review in terms of that, in terms of using, you know, adding to their algorithm to be able to identify it. I mean, I think it'll be coming arms race. And there will be new models coming out all the time that can pass the tests of whatever sort of verification models, Google decides to implement or not, I mean, the grammar model was actually positioned as like a fake news, checker, or identifier, and I can do that part of part of what it does, you can sort of run it in reverse, and it will tell you if things was human generated or computer generated, but it's relatively easy to fool. I mean, even people like me, like three periods, at the end of it, it all gets human. So like, there are a bunch of ways to get around it. And so I can't imagine it would be very easy for Google to come up with like a foolproof method for defeating, like, existing and all future iterations of these new NLP models that can generate almost human quality text. And even if they did, I don't like I'm not sure how feasible it would be given that they would have to run these models over every thing that they end up, which right, computationally would be really expensive, I would think,
Jordan Choo: yeah, you would, you would need a few server farms.
Kristin Tynski: Right. You know, I think there's also indications that there and they're not great at identifying like, really early iterations of computer generated talks that people still do like article spinning, where it's just like, a Mad Libs style, Find and Replace situation or, you know, text summarization that just uses some nurse are some AI models that will do text summarization, which pull out pieces of an article and put them together in a new order a shorter form, though, those sorts of things. I mean, they still work as SEO hacks. So I, if they haven't caught up to that level of sophistication, I just I don't it doesn't seem realistic, but they're super close to being able to implement something that's going to be able to, to catch this stuff, at least at scale right now. But I don't know. I mean, that's pure speculation. So maybe someone to Google window.
Noah Learner: So can Grover right, my kids turn papers?
Kristin Tynski: I don't know what grade are your kids then? Six? Probably, yeah. Yeah. I mean, I, I think you could definitely get a passing grade on like a book report written by any of the models. I'm really not kidding. In fact, some of them might actually raise the teacher suspicions because they'd be more sophisticated than what the kids writing were, some of them are really good. And because they're trained on so much data, they pull out like the most salient pieces of a topic. So it's, it wouldn't just be like, syntactically or grammatically correct, it would also include like the right information.
Noah Learner: I was shocked by how good your blog was. I felt like it was a little keyword stuffy. But really just barely, barely, barely, like as I kept reading it in sentence after sense. I was like, Oh, hey, does it feel like a person wrote it? It was shocking.
Kristin Tynski: Yeah, yeah. To me, it was totally shocking as well. And then if you really pay attention to what's being written, you start to notice additional things that the model is doing. So like one of the first articles, I think maybe it's the main one, I think I pinned it because I was so impressed with it. It's like five Instagram filters that are best for content marketing. And can you show us
Noah Learner: what you like?
Unknown Speaker: Just one second.
Noah Learner: Sorry, I was moving around the room. Guys. This is the first time I've ever had to struggle with like, you know, getting this well. With this, this specific thing. Getting our Hangout to work, usually was trying to get it to publish live to YouTube that we gave up.
Unknown Speaker: Alright, can you see?
Unknown Speaker: Yep.
Kristin Tynski: Oh, the other thing I forgot to mention as I use style again, so the thing that was used to make this person does not exist, I created these fake authors. So the face was created by an AI to. Um, anyway, what I thought was so interesting about this, this is what photo filters are best for Instagram marketing. It does like a bunch of really cool thing. So like, first, you'll notice it comes up with names for things like so it realizes that this would make sense to be created by some sort of marketing blog, and then says tech focused publication, mobile syrup asked a bunch of Instagram artists for their favorites. So mobile syrup was just a publication name it made up but it makes sense as one. And then it pulls out these these people it says are influencers. Some of these are real people. But some of them aren't, like Fredo Santana, that's actually one of the key words that I noticed this page started ranking for yearly. And then Lexi flora, so it makes up like these names of these people. And then it makes up the names of the bands that they belong to. And most of these bands don't exist. And then it starts talking about for each of these influencers, it talks about what filters they like most. And it makes up a bunch of names of filters, which sounds like they could be filters, meta Val rose mist, and then some of them, they also, they also describe what the filter does. So there's just a lot of understanding built into this, like the the structure of it is in a similar structure to what you were finding a news article and that it's an interview style. And each paragraph is a subsection where the that subsection is made up of individual influencer, that it made up who the influencers, why they're famous, and then what filter they like and why it would be applicable. I just found that really compelling. And a good illustration of kind of the depth of understanding that these new transformer models have they're not just, they're not just predicting from one word to the next there they're taking. They form an understanding across larger distances of text, so they can refer back to things and they can have an organizational structure that isn't restricted to just like a few sentences. And then they'll also call back to things that were like, really early on in the article. So that to me was one of the first things I noticed about some of these that have produced that I thought was, you know, really, really interesting.
Jordan Choo: For sure. Now, you brought up the, the control repo by Salesforce. And you recently published a very interesting post on traffic think tank with the Trump tweet bot.
Unknown Speaker: Yeah.
Jordan Choo: Would you care to share for people who haven't seen her?
Kristin Tynski: Yeah. So
Unknown Speaker: South Georgia to talk with him? Let him
Unknown Speaker: know, talk with him?
Noah Learner: Because he, he did something similar? No, I want to hear yours, though. I'm sure it's hilarious. And your language models newer, so it's probably it's probably vaguely, it's like much better bigger
Kristin Tynski: than mine. It's a Salesforce team built in. But yeah, they they released it, I think maybe like a three weeks or a month ago. And it was at least as far as I know, it's still the largest transformer model, I think it's like 1.6 billion parameters, which is slightly larger than the unreleased GPT to model from earlier this year, which is still unreleased. So as far as I know, it represents like 100%, state of the art for text generation. And then, earlier this week, I think like, actually, six or seven days ago, they updated the code base to include a new and a new module for fine tuning it. So essentially, you can take the pre trained model that they released the 1.6 billion parameter model, and fine tune it with text of your choice. So in this case, I, I set up the model on a virtual machine, you have to use like a GPU, virtual machine was actually a little bit expensive to run, just because it's so big. But I set that up, and then I fine tune it on a pretty large corpus of Trump transcripts of his speeches, and I trained at the first night I trained it for like an hour or two. And then I over the next day or two, I I subsequently trained it a lot more. And it it's gotten, at least in my opinion, it does really, really well in mimicking him. I'm sorry, let me I can actually, Shannon, if you just give me one second.
Noah Learner: I'm hoping you can talk with us about getting sentiment out of content because I, I used a pretty similar API where I think there was a there's a Trump Speech API. And I was trying to figure out how to get sentiment out of them. And I, you know, I struggle with how to how to, that I struggle with, with pulling out phrases versus just words minus stop words. And I feel like there's tension between the two and you kind of lose meaning at some point, Jordan.
Noah Learner: Let me just close every computer. So sorry, guys, this has been
Jordan Choo: Oh, no worries. It's it's Murphy's Law.
Noah Learner: So much smoother than this. So I freaked out, because when we started the meeting, I had this big, there's a big orange button that says, broadcast now, I hit that we started talking, the button didn't go away. I then was like, oh, I'll leave the meeting. And I'll rejoin. I rejoined, zoom, zoom them gave me a Ford 20 gave me a Ford 29 error telling me that I couldn't connect. So then I had to switch over to the iPad, and then my computer's doing whatever saying so. So sorry. Anyway, so there's tension. Sorry, go,
Kristin Tynski: I'm sorry, go ahead.
Noah Learner: I was just gonna say there's tension between entities, and adverbs and adjectives. And I feel like you lose meaning when you pull out some of the stop word sometimes. And when you're trying to come up with ad copy, or like meta descriptions, or page titles, or whatever, there's, there's, there's a point at which you start to lose valuable stuff. And I'm wondering how you kind of think through that.
Kristin Tynski: Um, I mean, in terms of, and I've done sentiment analysis, I haven't I haven't done a whole lot with it. I know, there are a lot of different options that work invariably, well, it's done to me, it still seems like a pretty rough thing. But that hasn't really advanced to the point where it truly captures sentiment that well, like we've done a bunch of projects and fractal that have incorporated sentiment analysis in some way or another. And oftentimes, when you spot check it, it just feels like, yeah, it's looking at like individual words, and not really understanding the context or, you know, the multi word segments very well. So yeah, I haven't been super impressed with the utility of sun men analysis. Yeah.
Noah Learner: It's just interesting. Their review tools are pulling in. They're using Watson to process stuff. And they're, they're saying, Hey, we're seeing these entities show up in reviews. And it might be taco, or car or customer service. But it doesn't really give you meaning about them or sentiment about those entities. It's just saying, like entity name. And I don't know, I come up with Data Studio reports about it. And I feel like there's just not enough meaning in it. Unless you really play around with it. And then all of a sudden, you'll find like that magic sweet spot. But it's not. It's not something that you can transfer from one one business or vertical to another. So it's like, this is frustrating.
Kristin Tynski: Yeah, I haven't. I haven't seen. And it doesn't mean it doesn't exist. But I haven't really seen anything like that can effectively understand sentiment about specific entities, like within the context of a sentence. Yeah, but you know, at least not very effectively. Yeah. So as Jeremy, I'll share my screen I can show you Yes. Hi, sense from Actually, I didn't post this one yet on traffic Think Tank. So I don't I don't want to keep posting all this stuff on there. Anyway, one second.
So I, I fed it the prompt, I shouldn't be elected for a second term as president. And this is what I wrote.
Can you guys read that? Or do you want me to read it out loud?
Noah Learner: you zoom in a little?
super long. It's amazing. That's like how he would talk. Right? Yeah.
Kristin Tynski: Yeah, I mean, and even includes things like, cuz then the training corpus had like, audience reactions in it. You can see like, the fourth line, has booing him reacting to the booing.
Jordan Choo: That's, that's really impressive.
Kristin Tynski: Yeah, I honestly, if I showed this to either of you guys, and told you that this was part of Trump's speech yesterday. Like, I don't think you'd come back to me and say, I think this was written by an AI. No, not at all. So do you plan on rerunning your experiment with the marketing blog that doesn't exist with control? And she's probably not because control is, is the much larger model and it takes a lot longer to to create text. So it would probably take a few weeks to generate a few hundred articles. And it would cost probably a few hundred dollars, at least. I don't know. I mean, it's it's tempting to see if if there is any improvement on this versus the rover model that I mean, this marketing blog does not exist from Yeah, maybe in the future, I don't know, I, I have some misgivings about doing a lot more of, you know, putting this kind of texts onto the internet, like outside of the goal of just trying to publicize this. The scariness of this technology. I've actually already seen that there are a couple companies that are you using GPT to not Grover, but somebody that's using a, one of the larger GPT to model cell automated text, or automated articles, essentially. So that I mean, it's already beginning, you know, there are already a couple services that provide that, which to me is is kind of crazy. And I think they sell the articles for like, 50 cents apiece, like 500 word articles. He's I mean, it could be it could be creating millions of day.
Noah Learner: Wow.
Jordan Choo: scary stuff.
Kristin Tynski: Yeah, yeah.
Noah Learner: So what lessons can you teach us about scaling in a an agency? You've done it twice?
Kristin Tynski: Yeah. So my first agency, I started with my brother and my brother in law and 2007, right out of college. It was basically, it was a content marketing agency. But the focus was really on infographic creation, when that was kind of the thing that really, really works well for link building. And after operating that, for three or four years, we decided to sell it to a company called blue glass in Tampa, and become part of their team that didn't really end up working out the way that we thought it would or should. So we ended up leaving there after about a year, and taking the lessons that we learned through that process and deciding to start a new agency. And that that's the agency that we started is fractal. So fractal is about six or seven years old now. And we are a content marketing agency, our focuses on data journalism driven link building, so trying to generate press through the vehicle of data driven storytelling. And our agency is essentially made up of content creators and content promoters. And the way that we at least in my opinion, the two biggest factors in helping us grow or the quality of our work differentiates us or processes differentiates us because we do more than just content creation, we also promote that content, our results differentiate us. And then we do we do a lot to generate inbound lead flow. And that really comes down to doing things like this marketing blog does not exist, which are, at least in our opinion, kind of thought leadership pieces for, for the industry, trying to start conversations, and then that also doing a lot of work exploring the aspects of our business, and that really matter to success. So what goes into creating content that can drive that can really get news backups? What are publishers interested in? How do you pitch publishers properly? How do you leverage data, data science machine learning? to to really come up with new newsworthy stories that other people haven't done before? And then how do you pair that with, with digital PR to effectively get it picked up by the press? Then how do you support that with, with additional outreach and additional syndication support to make sure that you can get like the maximum value possible out of the work that you've done? And the promotion is what it feels to me like, there's so many different ways of thinking to do. There's so many different types of brains to execute that. Well. Howdy, how did you guys all split up the roles? Like how does how does your executive team split up that because writing is so much different than data science? You know what I mean?
Kristin Tynski: Yeah, I would say our creative team has to had to become proficient on storytelling and also data analysis. That's, that's really a role of a data journalist. And that's, that's how we think of our content creators here. But their main purpose is to tell a story, but they're doing it through data. And there's been a learning curve to that. But we've been working on that, for know, pretty much the entire time, we've been working on fractal and we've doubled a team that is has become experienced in doing it. And through iteration, over time, we've learned what works well. And what doesn't work well. What sorts of projects require more effort than they're actually worth, which ones will have a high ROI for the amount of effort required, and what sorts of topics and what sorts of data driven content executions seem to seem to resonate Well, with the people that we pitch. So there's a lot that's gone into it over the long term, and getting it right and improving, and being able to do something that can scale and offer similar results across a lot of different clients. And in terms of our team, I mean, I'm really lucky that it's a family business. So the people that own it are me, my brother, my brother in law and my partner. And that has made things a lot easier, because everyone's really aligned all the time, you know, we're all working toward the same goal. And we y'all have some familial connection, but with that son, can also be hard working with family. And so it's really important to, like divide and conquer, and for each, each person that you know, each partner and fractal to, to really have a focus on a different area of the business and for their not be a huge amount of overlap between us, when we come together, when we have to make decisions that affects the entire agency. And, and set goals and things like that. But I like I'm not the CEO, I, I run the creative team at fractal. And my brother runs sales, my brother in law, Nick does is the CEO. So he, he does all the CEO responsibilities. And then my partner, Kelsey, she's she manages the outreach team, and the growth team. But we all contribute to other pieces too. So I, I like to contribute a lot to like the thought leadership and outreach because it allows me to kind of like stretch my creative muscles and explore new methodologies that we maybe wouldn't do for clients. But because we have more time to do things for ourselves, I can I can do more experimental things and try and find answers that aren't really easy to come by, that we're interested in as an agency, and that we think the greater SEO and content marketing industry would also be interested in
Noah Learner: what parts of the business have you guys found easiest and hardest to manage? Or scale?
Kristin Tynski: I don't think any of them have been easy to scale, really. Because the type of work that we do is, is content creation. You know, each individual project probably requires 40 plus hours of work from our content creators, and then, you know, maybe 30 to 40 hours of working in pitching. Yeah, it's it's I mean, it's a lot because it's one to one pitching, it's, it's, it's trying to find the right home for something that we consider, hopefully about the publishers were pitching considered interesting and newsworthy. And so you have to approach it in a in like a very high touch way. So there, there are ways to that we, you know, we've done a lot of work improving internal processes to try and remove bottlenecks, and to try and maintain quality assurance. And so there are efficiencies that can be gained process wise internally, but in terms of like the work itself, and there are limitations to like how low you can go, I think, just because of the nature of the work is is pretty time intensive.
Noah Learner: Hi, original question. I'm sorry.
Noah Learner: Yeah, no, that that's, that's pretty amazing. I was, I spent some time on your LinkedIn profile before the call, and I, I loved how you had those two links at the top of your profile how you link right into the deck. I thought that was really cool. And one of the things that I remembered on the deck was that you average something like 90 backlinks per per project. And I'm just doing the math, if you're doing 30 hours of pitching, and you're getting 90 links, it's pretty fascinating. That's so much work. I mean, content promotion is the thing that I feel like in my small agency, I just don't have enough time for oftentimes in it. And I have, I can dedicate some time per client per month, but not enough. And I know that that's where the rubber meets the road. And we've automated so many processes so that we'd have more time for things like content promotion. What have you guys been able to automate? Sorry to ramble at you. But what have you guys been able to automate that you're really proud of.
Kristin Tynski: In terms of automation, it really has mostly to do with with like workflow. And so we we leverage a lot of like off the shelf tools, so like our project management software as a service. And that's, that's worked great for us. We use slack internally. And then we, we use a tool called clip folio to essentially monitor our productivity and the quality of our work. So we've built dashboards for each segment of our business to understand the promotions team and the production team independently. So we have metrics on how many pitches are coming out each day, what the success rates of those pitches are, what the placement rates are, what the rejection rates are, what the response rates are, we can look at that granular the each individual MRI or Meteor Media Relations associated, as we call them, we can look at it by account team, we can look at it by client. And then we from that we can extrapolate and understand benchmarks across the entire agency and across the cow groups and across clients. And that helps us set expectations for those team members and measure their work against those expectations. On the content side, we have something very similar, it's just the metrics that we use to evaluate them and to evaluate our our efficacy are based on the quality of the content and the speed at which it can be produced. So we set benchmarks for how how long an individual project should take what the averages are, the median is for an individual team member across a quarter or across half a year. What that looks like by account team, what it looks like my client, and then lots of other metrics that are about the quality of the work. So we do to internal peer reviews per project. So we do want an early stage and want to sort of like a final draft stage. and I both of those stages we are recording, our senior creatives are scoring and recording the quality of those pieces of content. And then that's part of part of one of the metrics that we use to evaluate the creatives that produce them, like on a quarterly or half year basis. So we've automated like the the analysis part a lot. And I think that's helped us to understand where we need to work on things and where things are working well and where things have improved. It gives us insights into how we can adjust internal processes to become more efficient and, and better quality and create better quality work. And in terms of automating the process of data journalism, or content creation using data, there's a ton of individuals specific things. So like, you know, dozens and dozens of scripts that have been written by our development team to help automate parts of the data collection process for, for a lot of different specific individual cases. And some that are extensible. So like we've done a ton of work with, like the fatal accident reporting system, which is a government data set, that gives a huge amount of detail on fatal accidents. So we've use this data set for a variety of clients, and like the car insurance and car sales space. And so we put a good amount of time into decoding that data set and putting it in formats that we can easily analyze. So a lot of times there's there's a lot of hidden time and and money spent in that data acquisition and data cleaning portion of the content production process. And you don't want to have to do that same thing over and over again. So where there are opportunities for us to, like simplified either parts of those process data cleaning or data acquisition, we do that. And then we can reuse those pieces of code later on. other team members can use them when they're doing similar projects, or even completely different projects to rely on the same data sets. So we do a lot of that. The other part is just trying to help our team level up. So we want, you know, in the next year or so we want our entire creative team to learn the basics of Python. Just because it can massively simplify some of those early stages in the in the data acquisition and data analysis process are creatives can themselves scrape data from the internet, or can themselves transform and clean data using pandas and Python, it can save, you know, 20% 30% of the time that they would have spent, you know, trying to pull it in and out of Excel or these other tools to clean data that are, are not as extensive Well, not as fast and just more difficult to use generally, than if you you know taught yourself or learned the basics of some of these, these Python libraries that will help you clean and manipulate data much better and faster. So
Noah Learner: I thought amazing, I'm, I'm I feel like total newbie, the thing that's frustrating about it is you'll spend time in Git Hub, you'll find a repo that you want to work with, you'll then try and pull it in. And then you try and run it in the first two hours is just like dealing with broken dependencies. It just drives me flipping nuts. And it's just
Kristin Tynski: I I know exactly how you feel I'm I'm a relative Python newbie to and yeah, I mean, like setting up these these transformer models is like setting up back the Salesforce control and it was like a giant pain in the ass. Even though it was like decently well documented, like, there's, there's just a lot of ways that you can go wrong that especially if you're relatively new to it, you don't have the intuitive understanding of why or what might be wrong. And so yeah, it can just be like a very clunky, frustrating process. But I found that when I have something that I'm really interested in seeing work, I'll have like the dedication and fortitude to see it through. And through that process, I'll learn a lot more. So like for me part of learning and getting better at Python is picking like these, these pet projects that I find fascinating. And then just just brute forcing it until I can get to where I want to be.
Noah Learner: Whoo. You know, I think I'm going to acquire that as my new superpower. I've told people that grit is and not giving up. But I think it's more brute force.
Kristin Tynski: The key is to be you have to be interested in the thing that you're going for, like I would have given up so long ago, if I wasn't just totally fascinated with these new technologies, and you know, wanting to gain the superpower that they provide myself. I just, that's a compelling reason to learn something from me.
Noah Learner: Which which lessons have you learned along the way do doing all this? This kind of like AI ml stuff? What lessons have you learned? Like, what? What expensive mistakes? Can you share?
Kristin Tynski: Um, don't don't accidentally be an expensive, you know, multi GPU virtual machine on running, running on Amazon Web Services.
Noah Learner: Oh, yeah, I learned that lesson too.
Kristin Tynski: So let's see do like, actually, all night last night, because you have to like, you click, you can click stop. And then it says, Are you sure you want to stop this machine? And I thought I had clicked it. But apparently I hadn't. And I found it was running this morning. So I think I spent like $75 overnight.
Noah Learner: Oh, I was gonna ask you if you were over under 200. Cuz that was that was my that was my expensive lesson.
Kristin Tynski: Well, my brother did it too. And his lesson was much more expensive than that. And so I feel as bad going to come back and tell him that he made a much bigger mistakes. Yeah, that's that's a tough lesson to learn. Although I think Amazon and Google Cloud when that happens.
Noah Learner: And what can you share with us? In terms of like, stepping back and talking about digital marketing agency life? Can you share with us share with us your off the shelf tools besides clip folio and Asana?
Kristin Tynski: Um, I mean, for SEO, we use sem rush and H refs mostly, like social content, data? Buzz Sumo. For outreach management, we use buzz stream. And then a lot of other various like small things for like, other tasks, but like, those are the main ones that we use internally?
Noah Learner: And are you guys building lots of solutions internally with whatever language like do you guys What? What language? Do you guys lean on the most besides Python internally, to kind of tie the eyes together, etc?
Kristin Tynski: I mean, it's almost entirely Python, to be honest.
Noah Learner: Okay. Cool.
Kristin Tynski: Yeah, just because we do so much data scraping and data cleaning and data manipulation. I mean, you can do in a lot of languages, but the off the shelf stuff in the document for Python is, in my opinion, better than most other languages, so and then also pythons using machine learning. So frequently, you know, it's kind of the standard there. So, to us, we kind of double down on Python.
Noah Learner: Can you share how you scrape?
Kristin Tynski: Um, I mean, it depends on what we're scraping. There are, there are a lot of scraping libraries that you can use, we use one called newspaper, which works really, really well. And there's Beautiful Soup as well. I don't know if at all, I think different ones were better for different use cases. Like in some, I think in some instances, there are actually like built in, like web tools that you can add as like a browser extension to do web scraping, that can be easier than writing a super complex custom code to like, go through a bunch of, you know, segments of each GMO. Maybe if you were super professional writing Python code, you could do it faster than you could with some of these, these web plugin or browser plugin tools. It just really depends on the case in the site. So like we did a project, actually, for one of our We own a couple of websites that we do content marketing, or just ourselves to drive some link, secondary revenue. And we own one called lawsuit.org. And so we're trying to build it up. We're doing some link building for right now. So I, I recently did a project that was scraping public records, the public records database and Palm Beach County, for traffic citations. And within the public records, if you look at an individual record, the the police officers name and badge number that all the person over and give them the citation, we have this interesting idea of scraping that entire set of public records for the county and then doing lack of bulk data analysis of the individual police officers and their like their rates of citations based on like race or based on gender based on citation type, or based on location, and see if we could find like, dirty cops in the data or racist cops in the data. And we found some interesting things actually, that we're going to be releasing that soon. But the most difficult part of that process was in writing the scraper because it had a bunch of things to try and prevent scraping.
So like Captcha, like Google's Click here if you're not a robot CAPTCHA. So in addition to actually writing the script to navigate through the hierarchy, and click through the hierarchy, and then download the information, once it got to the right place, we also had to write some code to to handle the captcha solving and a few other small things. So each individual case is super unique, and often requires, you know, like a more technical scraping solution than you can get with like, browser plugin.
Unknown Speaker: That's amazing.
Noah Learner: So do you cackle when it when it runs?
Kristin Tynski: I mean, I was pretty happy when we went to find when we finally got to work, but it took one of our developers like three or four days to finish the code to do it. Well,
Noah Learner: that's insane.
Kristin Tynski: Yeah, it what's actually interesting is that it's there have been some court cases that have set a precedent that say, it's completely legal to fully scrape government data repositories, public government data request repositories, even if they use, like anti scraping techniques to try and prevent you from doing it. So although we had to be a little bit subversive with that approach, it's a completely legal thing to do. And actually, in my opinion, a good thing to do, that data should be accessible to a wider audience of people it shouldn't be. It shouldn't be hidden behind a whole lot of unnecessary navigation, you should just be able to get it all so that we can, as citizens, as David journalists, as content marketers, investigate important government information that affects us.
Noah Learner: That's amazing. Um, I feel out of my depth a little bit with all the stuff that you're doing, like a lot of the the automation that we do help help agencies run smoother by automating individual processes, you know, whether it's onboarding, whether it's task management stuff, I had not really thought about how to automate all the content marketing things. I build tools internally that auto generate HTML output. So you just you put in configuration variables, and it outputs HTML code, and away. It's formatted according to CSS output that we anticipate. And I feel like a rocket scientist when I get that to work, and I feel, and I know my team appreciates it, because they save a lot of time. But the stuff that you're doing is so next level, it's just it's kind of mind boggling.
Kristin Tynski: Well, thank you for saying that. But I think it can be easy to get caught up and being proud about like some technical solution to something you wanted to do, and and get a little bit lost in like, what the ultimate value of that work is. And sometimes it's like the simplest, most straightforward solutions that are co driven or not, that end up being the biggest time savers. So I think it's important to focus on ways that you can automate your business and, and utilize new technologies to help you do that. But not to get so focused on the fact that, you know, you're going to leverage this cool new technology that you don't think about, maybe there's an easier, more straightforward way to do it, that would have a much higher ROI in terms of how long it would take you to implement. Yeah.
Noah Learner: Mary asked a question. I think she means. What are your? She said, Kristen, you are goals? And I think I think I think she means like, I don't think that's the question. I think she's saying that you're the bomb.
Kristin Tynski: I appreciate that. Thank you.
Noah Learner: Is that right, Mary? Is that? Is that what we mean? Looks like a yes. Chris. There you go.
Jordan Choo: Awesome. So Kristen, we're, we're coming to the top of our hour. And we do want to be respectful of your time. And thank you very much for for spending an hour with us. Oh, yeah. Anything else you want to add? before we say goodbye?
Kristin Tynski: No, you know, I enjoyed meeting you guys. And I appreciate the conversation. I'm glad you guys see, you know, the implications of AI NLP technologies in a similar way to how I see it. And, you know, I hope you guys continue that conversation. And I, you know, I appreciate you having me on and asking me about fractal too. So I had a lot of fun. Thank you.
Noah Learner: I think the we're going to have Jr, oaks on pretty soon. And you were, we were talking about Python dependencies. And he and I were chatting about something that he built. I was like, Hey, I can't figure out how to get this to work, blah, blah. And every time I asked him a question, I feel like a total dumb ass. And I said, You know, I feel dumb again. I said, Does this ever get easier? No. No.
Kristin Tynski: Totally empathize with that I'm here. I sit next to Matt Gillespie, who's our head of data science. And he's a very, very proficient Python programmer. And, you know, I'm always I'm constantly bugging him, as I've been learning. And you have, you have to kind of be a little bit humble and just be like, Listen, I know, this is a really, really dumb question. But, you know, it's gonna help me a lot here. So I probably just just take the approach of like, invest in trying to learn something that's fun for me, and then not be embarrassed about being an idiot and being okay asking for help from my colleagues that are much better at this than I am. Yeah.
Noah Learner: I'm pretty. I'm pretty blown away by this talk. And I feel like I couldn't focus enough and humbly apologize. Typically, I'm more on point because I don't have to. I'm physically holding the iPad steady the whole time, folks. Anyway, this was amazing. Kristen, you rock. Jordan. This was amazing. Thank you, everybody. We really appreciate you spending an hour with us. And we look forward to our next hangout in two weeks. Our guest is to be announced. We're very excited. We're talking to a couple people. Christian, have a great weekend. Jordan, you to everybody. Keep on automating
Kristin Tynski: You too guys.
Jordan Choo: Thanks, everyone. Thank you, Kristen.
Kristin Tynski: Bye
Share and learn about automating your digital marketing agency.