DataStory™: an interactive sequential art approach for data science and artificial intelligence learning experiences

Technical training in the fields of data science and artificial intelligence has recently become a highly desirable skill for industry positions as well as a focus of STEM education programs in higher education. However, most of the educational training and courses in data science and artificial intelligence are abstract and highly technical which is not appropriate for all audiences. In this paper, we propose a sequential art approach that uses visual storytelling with integrated coding learning experiences to teach data science concepts. A scoping literature review was conducted to answer the following question: does sufficient evidence exist in the literature to support a sequential art approach to data science and A.I. education? The learning science, sequential art, and dual coding literature bases were then interrogated to answer that question. With knowledge gained from this review, an initial DataStory™ prototype was constructed, using a technical platform capable of delivering an engaging and interactive sequential art learning experience. And finally, findings from a focus group study using the DataStory™ prototype are discussed in which participant feedback to this new learning experience is reported.


Introduction
In many ways, the delivery of technical training-data science more recently-has not changed much over the years. The traditional lecture reigns supreme, usually delivered with the aid of an unending parade of Power-Point slides. For students, the experience is often mindnumbing, as it has been for the authors of this article. This method of instruction is parodied by Ben Stein as he delivers an economics lecture in Ferris Buehler 's Day Off (1986). The students are quickly rendered comatose by the drone of Stein's voice. The scene perfectly captures what learning looks and feels like on the technical education death star, the burned-out hulk of a once productive sun where all enthusiasm and motivation for learning quickly fizzles.
The need to escape the technical education death star is especially acute in the field of artificial intelligence (A.I.) education. Admittedly, artificial intelligence is a broad field, encompassing machine learning, deep learning, and a wide variety of data management tools and techniques. In this article, an expansive definition of A.I. is taken and includes foundational statistical ideas such as regression and probability that act as gateway ideas into the field (Russell & Norvig, 2009). The way in which basic technical content is presented, however, is often technical and abstract, far removed from the mind's preference for information delivered in a story format. So, is there a solution? Could a sequential art approach to data science education launch us in a new and more productive direction? Comics artist Will Eisner coined the term "sequential art" to describe art forms that use images deployed

Open Access
Innovation and Education *Correspondence: danielmaxwell@ufl.edu 1 Research Computing, University of Florida, Gainesville, USA Full list of author information is available at the end of the article in a specific order for the purpose of graphic storytelling (Eisner, 2008). Sequential art "refers to a number of sequentially juxtaposed abstract images that focus on form and technique, which may elicit from the viewer an aesthetic response, a notional sense of narrative and/or a possible theme" (Tabulo, 2014, p. 30). In this article, a report of our initial search for answers to these questions is presented as well as interactional findings from the development team's first DataStory prototype. The story told here consists of five parts. In section one, the rationale for a new approach to A.I. education is presented, followed by a second which describes the scoping literature review conducted by the research team. The purpose of the scoping literature review is to discover what research has been conducted within the area of sequential art and data science and A.I. education. Section three briefly discusses what was learned from the scoping review, and section four introduces the first DataStory ™ prototype as well as findings from a focus group study in which participant reactions to this new type of learning experience were recorded. And finally, a concluding section summarizes what has been learned so far.

Rationale
Findings from Deloitte's survey of executives' knowledge of cognitive technologies and artificial intelligence indicate the accelerating need for A.I. literate employees across the world (Deloitte, 2018). In fact, the demand for such workers already exceeds supply, resulting in what the report calls an A.I. skills gap, with a majority of business executives (68%) reporting a moderate to extreme skills gap (Deloitte, 2020). The demand for qualified A.I. faculty far exceeds supply as well. Presently, there is a scarcity of A.I. faculty who can develop and deliver highquality learning experiences. These facts lead to but one conclusion: All countries will need to develop efficient ways of delivering engaging A.I. learning experiences of high quality. The creation of massive open online courses, known as MOOCs, has been one response to this problem. However, extremely high attrition rates in these free courses raise concern. A recent almanac update in the Chronicle of Higher Education, for example, reports retention rates of 10% or less on the EdX platform. EdX has seen steep dropoffs in overall enrollment as well (Chronicle of Higher Education, 2019).
The essential issue can be stated this way: How do we create engaging A.I. learning experiences which motivate and retain students? Because the emerging field of A.I. education is so new, little is known. Most of the published literature, for instance, is a byproduct of the Symposium on Educational Advances in Artificial Intelligence, held annually since 2010. Other investigators with rather narrow research agendas have contributed the following observations. Lavesson's (2010) case-study measured prior mathematical knowledge in relation to student attainment of learning objectives in an A.I. course. Currently, there is an increasing need for wider access to A.I. learning experiences, including the development of reusable training modules and teaching A.I to non-majors (Sulmont et al., 2019;Way et al., 2016Way et al., , 2017. Even so, pedagogical challenges remain and can benefit from structured efforts such as workflow tools (WINGS) to create A.I. processes (Gil, 2016;Sulmont et al., 2019). Further, early work indicates positive motivational effects when gaming strategies are integrated into the machine learning educational experience (Wallace et al., 2010).
Unfortunately, none of the articles just cited referenced sequential art or its educational potential. Nevertheless, the positive effect reported by Wallace (2010) about games and machine learning was tantalizing. Comics and gaming are close cousins. Indeed, sequential art has some unique characteristics which make it an ideal medium for explaining A.I. concepts and algorithms. Sequential art and cartoon story are used interchangeably because both terms represent the same medium of storytelling. However, sequential art is the preferred academic term used. First, sequential art is noteworthy for its innovative and creative use of visuals. A cartoon world, for example, is a visual world, characterized by the use of vibrant colours and strong images. Second, sequential art is unique in that it adds text to those images to advance a storyline. In comics, the images and text work together seamlessly to deliver a captivating entertainment experience. The only thing missing is educational content, in this case a learning arc capable of supporting interactive A.I. programming exercises in real-time.
The ease with which comic book readers recall famous storylines and characters suggests that this medium is rich with pedagogical possibilities, including long-term retention of basic A.I. concepts. It was time to broaden the search beyond A.I. and see if anyone had explored the educational potential of sequential art.

Literature review
Rather than conduct a comprehensive (systematic) literature review, the DataStory team sought answers to a limited set of questions. Because the much broader question of effective STEM instruction has been studied extensively in the literature, a primary research question was first articulated in order to delimit the scope of this inquiry. It reads as follows: Does sufficient evidence exist in the literature to support a sequential art approach to data science and A.I. education? The team then identified three additional, more specific sub-questions for the systematic literature review.
• RQ1: What evidence exists to support the instructional value of visuals and sequential art to explain technical concepts? • RQ2: What evidence exists to support the instructional value of stories to deliver technical content? • RQ3: What evidence exists in the literature to describe best design practices to construct visual story learning experiences which maximize content retention and mastery of technical concepts?
While the first two sub-questions seek to discover what is currently known about the instructional potential of visual storytelling, the third focuses on the principles that ought to inform the way in which these kinds of learning experiences ought to be designed and delivered. Or stated another way, an answer to the third question will provide guidance and practical know-how to inform best practices for using sequential art to deliver A.I. instructional learning experiences to students.
The team's research questions suggested an initial search strategy, including literature domains with the highest probability of offering relevant articles and research. Prior to the start of the DataStory research project, the project's principal investigator (PI) had already encountered the learning sciences literature, while investigating best instructional practices in data science education. That investigation, in turn, introduced him to dual coding theory, an active area of research in the learning sciences and a natural complement to the educational use of sequential art. The three primary domains are pictured in relationship to each other in Fig. 1.
Because the primary thrust of this inquiry is educational, it is only fitting that the learning sciences contextualize the overall inquiry. Learning science is a relatively new field which builds from and extends the older field of cognitive psychology. In both fields, the focus is on the cognitive dimensions of how we learn and how to apply that knowledge of how we learn in the design of effective learning experiences.
Sequential art-in relation to the learning sciences-is the primary domain of interest. And since sequential art combines a text-based storyline with a visual depiction of it, dual coding theory serves as the ideal theoretical framework, given its focus on the impact of text + visuals in the learning process. Dual coding theory suggests that the human mind processes incoming information in two channels: one for visuals, a second for verbal input. Therefore, learning experiences which combine both channels should be more effective, provided their design reflects best-practices.
As depicted in Fig. 1, the twofold nature of dual coding is pictured as two channels, with verbal on one side and visual on the other. On the verbal (text) side, the fields of narrative and case-study learning are of specific interest. Case-studies are a unique kind of educational story with a long and distinguished history, while narrative learning is more general in scope. On the visual side, multi-media and visual learning are the domain concepts deemed relevant to this study.
As stated at the start of this section, the DataStory research team felt that a comprehensive review-often called a systematic literature review-was not warranted, given the limited amount of literature. Instead, the team selected a method similar to that employed in a scoping study, in which a researcher or research team seeks to quickly map out the foundational concepts which underpin a research area. Scoping studies are also useful in situations where an area is complex, and the main sources and types of evidence have yet to be comprehensively reviewed. Arksey and O'Malley (2005) write that the "scoping study method is guided by a requirement to identify all relevant literature regardless of study design …To this end, researchers may not wish to place strict limitations on search terms, identification of relevant studies, or study selection at the outset. " They continue, "The process is not linear but iterative" (p. 22). That last point certainly proved true here in which initial findings were subsequently enlarged by additional rounds of discovery, primarily through the mining of citations from highly relevant articles. In fact, the DataStory research team employed a multi-step method to discover what is presently known about the use of sequential art in data science and A.I. instructional settings. Each of the steps will now be described in the sections which follows.

Three search steps
When conducting a search in an electronic database, an overall search strategy is often formulated upfront. In  Although it was not necessary to conduct an exhaustive search as required in a systematic review it is a best practice to conduct a search in three phases (manual, automatic and snowballing) to ensure a relatively rigorous process is used to develop a precise search strategy (Keele, 2007). For instance, while a manual search, which allows search venues to be preselected, is unlikely to identify all relevant literature, it will provide indicators such as, keywords, concepts, and domains to inform and build an effective automatic search to be conducted across selected databases. Likewise, the results from manual and automatic searches can be used to conduct snowball searching to identify other relevant literature not yet captured. These three phases (manual, automatic, and snowballing) are described in detail below. Once each domain search was completed, the screeners reviewed and filtered the search results, using a predefined inclusion criterion. Any divergences between screeners were rectified by the study's primary investigator.

Manual search
A manual search was first initiated in February 2020. In this preliminary phase, the authors jointly chose the venues (journals, conference proceedings, and book chapters) recognized as highly specific or seminal to each domain from the preceding ten-year period (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020). A cursory review of the preliminary manual search revealed that key literature was missing. The authors knew this was the case as they had already surveyed the relevant domains and knew some of the primary authors in each one. The search was therefore adjusted to a date range from 1980 to 2020 to capture any historically significant literature that may inform a new perspective.

Automated search
The automated search was conducted in the databases listed in Table 2, with the search scope limited to 1980-2020. A search string was created for each concept and positive results indicated a match in one or more fields: title, abstract, or keywords.

Screening
This initial set of articles was then screened by a team consisting of three members: an academic research librarian and two graduate students. The team applied a set of inclusion and exclusion criteria to identify the Computing Education (with data/data science education/ computer education) (data OR dataset OR "data set "OR datasets OR "data science" OR "data science education" or "computing education" or "comput* education") Computing Education (with comics/sequential art) (comic OR comics OR "sequential art" OR "sequential art narrative" OR "sequential-art" OR "sequential-art narrative" OR cartoon or cartoons OR "comic strip" OR "comic strips") Learning Sciences (from learner perspective) (learner OR learners OR student OR students OR "novice learner" OR "non-major" OR nonmajor) Learning Sciences (from teaching perspective) (teach OR teaching OR instruct OR instruction OR instructional OR educate OR education OR educational) Dual Coding (overall theory) ("dual code" OR "dual-code" "dual coding" OR "dual-coding" OR" dual code theory" OR dualcode theory" OR "dual coding theory" OR "dual-coding theory") Dual Coding-Verbal Channel (with storytelling) (story OR stories OR storyline OR storyteller OR storytelling OR narration OR narrator OR narrative OR narratives) Dual Coding-Verbal Channel (with case study) (case OR "case-study" OR "case study" OR "case studies" OR "case-based" OR "case based") Dual Coding-Visual Channel (with comics/sequential art) (comic OR comics OR "sequential art" OR "sequential art narrative" OR "sequential-art" OR "sequential-art narrative" OR cartoon or cartoons OR "comic strip" OR "comic strips") Dual Coding-Visual Channel (with storytelling) ("visual learning" OR "visual learner" OR "visual learners" OR visual stimulation" OR "visual art" OR "visual arts" OR "visual narrative" OR "visual story" OR "visual story" OR "visual storytelling" OR "visual curriculum" OR "visual media") search results which answered the study's research questions. An article identified in the search results was retained if it met the predefined inclusion criteria or excluded if it did not. The following inclusion and exclusion criteria were applied as stated in Fig. 2. The references which met the inclusion criteria were then entered or imported into Clarivate's EndNote Web, with all available citation data. These references then served as the starting point for the final step of the search process, detailed below.

Snowball search
Although the automated search covered a wide array of databases, it cannot be considered comprehensive. Thus, a snowball search was also conducted. In this step, the primary investigator first reviewed and identified the citations of highest relevance, and an electronic copy of each item was secured and read. Special attention was paid to the reference sections, with promising citations identified, located, and included in the expanding set of highly relevant items. The process is similar to that of creating a snowball in which one starts with a compact center (a single article) and builds out from there. The popular Web of Science citation index provides functionality to support and greatly simplify this work, allowing researchers to quickly traverse an article's citation web, both the items it cites as well as the articles which subsequently cited it. This final step greatly enlarged the number of highly relevant items.

Results
The final set of search results, including those discovered during the snowball search step, are listed by domain in Table 3. The totals also include items discovered serendipitously, outside the three-step search process outlined earlier.
A preliminary review of the search results revealed that few attempts have been made to fully integrate insights from these three domains (learning science, sequential art, and dual coding) into an innovative learning experience. Or stated another way, the literature appeared to support the viability of a new approach to A.I. education. But exactly what did it have to say? The answer to that will be explored in the next section.

Literature discussion
The final count of screened references in Table 3 is relatively small, indicating that a sequential art approach to science education has not been extensively explored. If the research team had restricted the search even  further-limiting it to data science and A.I. educationthere would have been no results whatsoever. Hence, a broader search was executed. What the team discovered will now be presented in the sections which follow, grouped by the domains from Fig. 1.

Learning science
Learning science is the field of study which concerns itself with the question of how humans learn and how to apply that knowledge in instructional settings while the field of cognitive psychology is the older, more established line of inquiry from which the learning sciences arose. Neither field, however, addresses the creative aspects of educational storytelling and visual content delivery.
Learning scientists have learned a lot in the past 20 years about how the brain works, how it retains information, and how it likes to have content presented to it. And recent years has seen a proliferation of practical advice on how educators ought to apply the findings from this field to the creation of effective learning experiences (Boser, 2017;Brown, 2014;Willingham, 2009). Because the DataStory team wanted to rethink the way in which data science and A.I. content is delivered, instructional design is of the utmost importance. The goal was to create a new kind of learning experience in which the learning and story arcs are seamlessly integrated, with the interactive exercises, quizzes, and reflective exercises constructed according to established learning science principles.
The expansive nature of the cognitive psychology and learning science literature meant that this search had to be limited in scope, restricted to items with direct relevance to the project's goals. Busch and Watson's overview of the learning science literature is an excellent starting point. The authors describe the seminal learning science studies every teacher ought to know. According to Busch and Watson (2019), the principles of spacing and interleaving are well established in the literature (Cepeda et al., 2008;Rohrer & Taylor, 2007). Spacing is the idea that it is better to spread out one's learning over a period of time rather than cram all at once. Interleaving, on the other hand, is the practice of studying a topic for a period of time before shifting to a totally unrelated topic.
In addition to spacing and interleaving, Busch and Watson (2019) highlight other instructional best practices. Frequent retrieval practice and low-stakes quizzing is endorsed (Roediger & Karpicke, 2006) as is positive emotional engagement (Pekrun et al., 2017), write-to-learn assignments (Gingerich et al., 2014), and task-related feedback (Kluger & DeNisi, 1996). Naturally, the immediate delivery of task-related feedback presupposes a platform capable of delivering interactive learning experiences with real-time feedback, functionality the DataStory team considered vital.
And finally, findings from cognitive psychology support the use of visuals (Mayer & Anderson, 1991) and stories as learning tools. The combination of pictures and words to explain concepts has a long and distinguished research pedigree, with Allan Paivio formulating dual coding theory in the early 1970's. The foundation ideas and literature related to this theory will be presented in the next section.
The learning science literature made two important contributions to the team's evolving understanding of how humans learn and how that knowledge of human learning can be applied in an instructional setting. First, it established the principles of learning arc construction for how humans learn new concepts, the foundation of each learning experience. Second, it highlighted the importance of dual coding theory, including the value of combining visuals and text in a learning experience which are applications of human learning to the instructional setting.

Sequential art
The extensive and rapidly evolving interest in sequential art as an educational medium came as a surprise to the DataStory research team. Sequential art is already being used as an instructional tool in a variety of disciplines, mainly in the humanities and social sciences, less so in the sciences. Thus, the literature discussed in this section is restricted to the use of sequential art in science educational settings.
While dual-coding theory justifies the instructional value of well-designed visuals, initial research suggests that the synthesis of text and visuals achieved in sequential art make it an ideal platform for content delivery. Jee and Anggoro (2012) approach the topic of "comic cognition" and the educational value of science comics from a theoretical perspective, and they do a superb job of laying out the pedagogical risks and benefits of this medium. The authors conclude that the instructional value of comics has yet to be established empirically and therefore recommend that additional research be conducted, including increased engagement with the learning science literature-precisely what was done in the previous section.
As well, Farinella (2018), provides a comprehensive overview of comics from the perspective of science communication. Like Jee and Anggoro, Farinella (2018) writes that few have "attempted to quantify the effects of comics on the communication of science" (p. 3). This claim is an accurate statement of the situation. Less than a dozen studies have explored the efficacy of comics in the science classroom (Aleixo & Norris, 2010;Hosler & Boomer, 2011;Spiegel et al., 2013;Weitkamp & Burnet, 2007). Farinella summarizes the studies conducted so far by stating that researchers have consistently found that comics lead to greater levels of student engagement and motivation (p. 3). Of equal importance, he devotes a subsequent section to the instructional value of narrative, providing additional support for the inclusion of this domain in this review. The narrative literature will be discussed in a later section.
During the screening process, the DataStory research team came across some interesting yet somewhat tangential citations. Tilley (2017), for example, offers a detailed historical account of educational comics, though most of the applications are from non-science fields. Also, the field of data comics has emerged in recent years (Bach, et al., 2017;Wang et al., 2019a, b;Zhao et al., 2015). This evolving area of practice and research, however, is directly linked to journalism and infographic communication and is of indirect interest. And finally, humor is a key component of sequential art and its instructional value should not be overlooked (Banas et al., 2011). Fear shuts down student learning. But fear fades when one is laughing because laughter opens the learner up to the possibilities of positive engagement. Paivio (1971Paivio ( , 1986Paivio ( , 2007 is widely regarded as the originator of dual coding theory. The basic postulate of dual coding theory is that the human brain consists of separate but interconnected verbal and nonverbal (visual) systems. As stated earlier, dual-coding theory and sequential art complement each other, with the focus of both being the effective use of narrative + visuals to deliver an impactful learning or entertainment experience. Consider, for example, the use of images and text in Fig. 3.

Dual coding theory
The effective use of visuals in combination with text has been the focus of learning science research for quite some time. With dual coding theory, "imaginal and verbal processing independently contribute to memory for concrete words, whereas only verbal processing is usually possible for abstract material" (p.125, Schmidt, 2008). This is a key idea in that it suggests that learning experiences which combine visuals with explanatory text offer two ways for the brain to record long-term memories. Thus, a dual channel approach to instructional design makes it possible to create learning events with reinforcing modalities, provided text and visuals are properly integrated.
Since its inception in the early 1970's, dual-coding theory has been widely researched and its basic tenets largely validated. Although a comprehensive review of this literature is beyond the scope of this article, the educational implications of dual-coding theory have been examined by Clark and Paivio (1991). Teachers will also find the practical advice proffered by Caviglioli (2019) to be of benefit. A closer look at the literature that underpins the verbal and visual channels will now be taken up in the two sections which follow.

Dual coding-verbal channel
The narrative (verbal) dimension of the comic book experience is well established. Through the use of word and thought-bubbles, the reader can participate in the thought life of the story's various protagonists. As well, the importance of the storyline has frequently led to famous pairings between writer and artist-Stan Lee and Jack Kirby being but one notable example. Thus, the presence of a compelling story or narrative is essential, and the findings of this scoping review confirm that truth.

Narrative learning
Narrative learning is "learning through stories-stories heard, stories told, and stories recognized" (Clark & Rossiter, 2008). In a Scientific American article, Jeremy Hsu (2008) makes an interesting connection between our love of storytelling and the neurological workings of the human mind. The mind, in many ways, is a storymachine, eagerly consuming content presented to it in this privileged format. Willingham (2009) writes "that psychologists sometimes refer to stories as 'psychologically privileged'" (p. 51). This message has not gone unnoticed amongst science educators. Olson (2015Olson ( , 2018, for example, has been a narrative champion for some time now, urging scientists to master the fundamentals of great storytelling.
Likewise, recent years has witnessed increasing interest in data storytelling, specifically in journalism and related communication fields. And though this emerging field shares some commonality with that of sequential art in education-for example, both employ sequential narrative structures-the resemblance is superficial. Whereas journalism concerns itself with events in the real world, sequential art is not limited to reality but often ventures into imaginary worlds. Additionally, the development of relatable and empathetic cartoon characters is not a consideration in data journalism but rather the communication of a factual narrative. For these reasons, the literature in this area, though interesting, was deemed outside the scope of this inquiry.
Transportation theory, on the other hand, offered some valuable insights into the power of cartoon storylines. Essentially, transportation theory describes the process whereby a reader gets caught up in a story, losing all sense of place and time. Readers transported by a story frequently experience vivid mental imagery, the result of a high level of emotional engagement. Green (2005), Green and Sestir (2017) is the author and primary advocate of this theory. The theory's relevance to the Data-Story project lies in its aspirational value. That is, the creative team seeks to transport the learner in each and every DataStory learning experience. Engagement precedes content delivery. The power of story to increase student engagement has been recognized for some time now, specifically in disciplines where case-study instruction is prominently featured. A particular type of narrative learning is care-study learning which features cases as part of the narrative storytelling.

Case-study learning
As just noted, the use of "educational" stories is not a recent development as it has been featured in business, medicine, and law case-studies for over 100 years. Case studies are a special kind of story, a narrative designed to build reasoning skills while also imparting content. Herreid (1997) writes, "Cases are stories with a message. They are not simply narratives for entertainment. They are stories to educate" (p. 92). As such, the story has proven to be an effective pedagogical tool.
There is a substantive and increasing body of literature that describes how to develop and use case-studies in the sciences. Herreid (2007Herreid ( , 2012, the founder of the National Center for Case Study Teaching in the Sciences, is a prominent figure in case-study instruction in the sciences. His many publications provide a practical introduction to the art of case-study construction and teaching. As well, the center's website features some 778 case-studies (National Center for Case Study Teaching in Science, 2021).
About a dozen articles in science-related journals have reported positive learning outcomes related to the casestudy method (Harmon et al., 2014;Grunwald & Hartman, 2010;Rybarczyk et al., 2007;Chaplin, 2009;Nair et al., 2013;Wilcox, 1999;Bonney, 2015;Yadav & Beckerman, 2009;Bjorn et al., 2013;White et al., 2009). Only Yadav et al. (2010) reported "no significant differences between traditional lecture and case teaching method on students' conceptual understanding" (p. 55). Even so, they still viewed case-studies in a positive light, given their ability to actively engage students in the learning process. Faculty also appreciate the benefits of casestudies as learning tools. Yadav et al. (2007) conducted a national survey of faculty perceptions of the case-study method and found that a majority reported positive outcomes when using this method.
Although the case-study literature validates the instructional value of narrative learning (the verbal side of dual-coding theory), case-studies-as currently conceived and constructed-lack certain features the Data-Story research team deemed critical. The first issue is that of focus. With case-studies, the instructional focus is the concise presentation of a problem, leading to the acquisition of relevant discipline-specific content and thinking skills. Data, in other words, are not prominently featured in most case-studies. But to innovate in the A.I. education space, the focus must change to one where data assumes the leading role in each learning experience.
The second issue is interactivity. The case-studies featured at the National Center for Case Study Teaching in Science website, for instance, are offered in static containers-MS Word, PowerPoint, or Adobe. The instructional innovation envisioned by the DataStory research team, on the other hand, needs to be highly interactive, constructed using open source tools such as RMarkdown or Jupyter Notebooks. These platforms allow students to run blocks of code and receive immediate feedback. Learners can also modify code blocks to fit their needs and/or analyze similar kinds of data sets. That is, these tools support dynamic, interactive learning experiences with feedback in real time, making them ideal learning tools for novice and experienced learners alike.
The third issue is storytelling. Most science case-studies present a problem and the associated content in a purely factual way. The conspicuous absence of a compelling storyline with interesting protagonists continues to be an issue, resulting in learning experiences which read like textbooks. Fortunately, that has begun to change. Recent articles by Herreid et al. (2014) and Young and Anderson (2010) suggest that the case-study research community has started to recognize the importance of an engaging storyline capable of transporting the learner through a learning experience.

Dual coding-visual channel
The educational value of well-designed visuals has been established in practice. Sal Khan of Khan Academy, for example, effectively uses visuals and drawings in his Khan Academy videos. And, the A.I. educator Andrew Ng utilizes visuals at specific points in his free machine learning class, demonstrating that A.I. concepts are amenable to visual representation. But is there theoretical and empirical evidence to justify the visual channel in educational settings? Yes, there is. Mayer and Anderson (1991), Mayer (2014) is a leading scholar in this area, with research beginning in the early 1990's and culminating in the recent publication of The Cambridge Handbook of Multimedia Learning. Mayer as well as most of the authors featured in this handbook are working from a dual-coding perspective. But in the case of multimedia learning, the visual channel assumes a greater importance and is enlarged to include not just static visuals but also video and animations. A key concern of researchers in this area is cognitive load, ensuring that the quantity of visual information and text does not overwhelm shortterm memory.

An initial prototype
With knowledge of how others had used sequential art to deliver content in the science classroom, construction of a DataStory ™ prototype commenced, with the intent of implementing this new information in a practical way. But first, the team needed to select a technical platform capable of delivering an engaging learning experience. The ShinyApp environment from RStudio was selected, with pedagogical functionality provided by the learnr package (rstudio.github.io/learnr/). The learnr package provides the functionally to create an interactive RMarkdown document that uses illustrations, coding exercises that users can edit and execute directly, quiz questions, videos, and interactive Shiny components. With these technologies, students can run blocks of code and receive immediate feedback. Learners can also modify blocks of code to fit their needs and/or analyze similar kinds of data sets. The finished DataStory is hosted on an openaccess, cloud-based ShinyApps (Rstudio) server, thereby eliminating potential student frustrations with accessing the story or setting up an executable environment on their own computers. A cloud-based platform also permits designers to quickly modify and update existing DataStories.
The initial prototype of the DataStory focused on foundational data science and statistics topics such as smoothing, correlation, and simple linear regression. There is potential to use the DataStory format as a learning tool to teach all audiences about advanced data science topics such as dimension reduction, classification, time series, and cluster analysis. Also, A.I. topics such as deep learning, reinforcement learning, and natural language processing could also be introduced at a foundational level using the DataStory. A focus group was conducted as an example study to see how the application of the information of learning theories from the literature review can be integrated into the creation of the prototype of the DataStory. Focus Group.
The initial example study about the DataStory ™ assessed the efficacy of the prototype as a learning tool. A focus group study was used as a way to test the DataStory with our target audience. The purpose of this study was to explore the design principles of the Data-Story with potential participants and gather feedback and comments from them about their learning experience. The study was intended to use the example prototype that employs the sequential art approach and feedback from the focus group to improve the initial prototype of the DataStory. The questions which guided the focus group were the following: • What reactions did the participants have to the DataStory (characters, storyline, etc.)? • What did participants have to say about the Data-Story's flow, content (learning objectives), and interactivity (level of student engagement in the learning exercises)?
A single DataStory was used as an example prototype with participants to understand their emotional engagement and motivations with the narrative and their perspective on the data science concepts featured in this learning experience. The feedback from study participants was solicited so as to improve existing DataStories and inform future processes in their construction. A single focus group with a feedback survey consisting of both open-ended and Likert scale questions was used to gather feedback and information from participants. Institutional Review Board approval was obtained before the focus group.
The DataStory prototype that was used included multiple panels of sequential art throughout six chapters that describe the context of a story about a young girl and her grandfather as they try to figure out what may have contributed to the collapse of the California sardine industry in 1953 with the help of two characters from a data consulting company called StatCat and DataDog. The panels of the storyline with the characters show the characters trying to solve this research question, and R code is interspersed throughout the chapters that require students to run the code to see the analysis and graphs that the characters are using to answer the research question. At the end of the Data-Story are the interactive exercises where the students are required to edit and write code to answer a series of questions.

Methods
A convenience sample of participants-including graduate students and professors-was recruited by the researchers. A total of four graduate students and one professor participated in the study and informed consent was obtained before the start of the focus group. The structure of the focus group consisted of an introduction of the research team and purpose of the study, a pre-survey with 6 questions, time (about 30 min) for the participants to work through the DataStory prototype, 15 post-survey questions, and a focus group discussion with 3 guiding questions. The focus group lasted a total of 1 h and 47 min and was conducted via Zoom. The entire focus group was recorded and transcribed for analysis. Thematic analysis was used to analyze the qualitative data from the focus group transcript and the open-ended responses. This work was executed by a team member with expertise in this method. The first step of analysis is for the researcher to familiarize themselves with the data by reading through the entire focus group transcript and open-ended responses. Afterward, an initial set of codes was generated along with examples for each code. The researcher then went back through all the codes and examples, searching for themes by combining initial codes together. Finally, the themes were reviewed and defined by using examples from the open-ended responses and focus group transcripts.
In the pre-survey, participants were asked to rate their experience level of the following concepts on a 5-point Likert scale from "no experience" to "complete mastery": R programming (the statistical language used in the DataStory), correlation, simple linear regression, and line smoothing/LOESS regression. The participants had little to no experience with R programming and line smoothing/LOESS regression. However, they had slightly more experience-from little to moderate-with correlation and simple linear regression. This experience level of the participants was preferred because the learning objectives of the DataStory focused on using R programming to conducting smoothing techniques, calculate and interpret Pearson's correlation coefficient, and conduct regression techniques.

Reaction to DataStory characters and storyline
After engaging with the DataStory and completing some of the learning experiences, the participants reacted positively in response to the first research question, indicating that they liked the characters and the fact that the story opened with a research question. An important aspect of the design of the DataStory is the narrative storyline with engaging and relatable characters. One participant said, "I like the characters. I thought they were good" (Participant 1, focus group transcript). Another aspect of narrative storytelling is the use of an engaging storyline which hooks the reader at the start of the story. For this DataStory, that hook is the research question posed to the audience by one of the characters. Another participant said, "I like how you begin with research question [s] and not with my coding experience it has always started with codes. Like codes, codes, codes. What does this mean, how do you run this, but you started with research question[s]…" (Participant 3, focus group transcript).
In addition to positive comments, participants also provided constructive feedback in response to the first research question. Some indicated that the narrative aspects of story length and consistency in language needed to be improved. The post-survey asked for feedback regarding the length and density of the content. Participants rated the length of the DataStory from "fair to average" with one participant rating the length as "excellent" on a 6-point Likert scale from 0 ("worst") to 5 ("excellent"). The participants rated the density of the content from "a little overwhelmed" to "moderately overwhelmed" on a 6-point Likert scale from 0 ("none") to 5 ("completely overwhelmed"). One participant stated, "I think the story is necessary but could be shortened or embedded with the exercises. This will allow the users to be more immersed with the data and the story" (anonymous feedback from survey). As well, participants recommended a more consistent use of technical language for the characters in the story when talking about 'R' and other terms related to coding language, including the addition of clear definitions of all relevant concepts. For example, one participant stated that "You make assumptions that the user knows terms that come from different areas such as spawning, programming, coding. You don't provide any information about what a sardine is, what spawning is or why it's important. You mention 'R' but do not provide an explanation for what it is or does" (anonymous feedback from survey). Another participant mentioned that "I think language really quickly becomes overwhelming. Both in terms of how it's designed and then how it's used consistently. So, for example, you talk about programming, code, and coding and sort of synonymously and it's not clear to me what you're really talking about sometimes" (Participant 5, focus group transcript).

Comments/feedback about DataStory's flow, learning content, and interactivity
With respect to the second research question, participants had some positive things to say about the flow, content, and interactivity of the DataStory. In particular, they appreciated the ability to run the code in the DataStory as well as the contextual information provided about the data. The interactive capabilities of the DataStory with the integration of code and story is a unique feature of this platform. One participant stated, "I do think this [DataStory] was much more than compared to traditional courses. I like the fact that you could run the code in there. You know, it was embedded in there. That was pretty helpful" (Participant 2, focus group transcript). Another said, "I really like the combination of me learning it conceptually, and then seeing it in the graph, like what does smoothing mean…like if you run the code and then this is what happens to the visual part, I really like" (Participant 3, focus group transcript). The data context was another key design feature of this learning experience. Or stated another way, the data was appropriately contextualized, with background information describing it embedded throughout the storyline. One participant stated that "I thought it was really helpful to have a data set that made sense. Like a story behind the data set to understand why I was doing these functions…" (Participant 1, focus group transcript).
Participants detailed multiple areas needing improvement with respect to the second research question, including the creation of a learning artifact as students work through the learning experience, additional direction and scaffolding of the exercises, and more integration of the exercises into the data learning events. In terms of the flow and progression through the DataStory, participants suggested that having the audience create an artifact or document as they work through the Data-Story would be helpful. The final learning artifact would contain all of the code examples in it, summarizing everything learned in a single document. One participant said, "For me to take away from this and then to have as a reference for [when] I do the exercises and that sort of thing that there needs to be a conceptualization of some sort of output that I'm putting together myself that's a reference document" (Participant 5, focus group transcript). Another participant stated, "Just as mentioned before, handy-dandy reference guide to R code would have helped me complete final exercises more quickly" (anonymous feedback from survey). This feedback relates to how the learning of the coding language R in terms of certain functions which correspond to the specific learning objectives of the DataStory can be improved. In terms of the educational delivery of the data learning events within the DataStory, participants recommended that more detailed direction and scaffolding of the exercises was needed. For example, one participant stated that "… so I thought maybe like more scaffolding, and the exercises, would have been really helpful" (Participant 1, focus group transcript). And another said, "The exercises seemed to ramp up way too fast. It went from editing a single function to writing complete code in one step. And there wasn't an easy way to go back and see the example used in the DataStory chapters" (anonymous feedback from survey).
Finally, the last suggestion for improvement was to enhance the interactivity of the DataStory by integrating the exercises more deeply into the data learning events. One participant said, "I almost think the exercises as written would be better if they were placed within the story and then the exercises that you do at the end would be like the open ones because the ones at the end of as they are currently did kind of feel like redo what you did with some slight modifications" (Participant 1, focus group transcript). Another stated, "Needs to be a better integration, though of asking me to do something. I see a lot of you do something and then you're waiting for me until the exercises and then I'm disconnected from I remember these parts but I don't remember where to go back to refresh my memory. And I think you need to ask me to practice after you see something happen" (Participant 5, focus group transcript).

Conclusion
The demand for a A.I. literate workforce has never been greater, and this need, in turn, has created a related demand for engaging data science / A.I. learning experiences. However, the way in which technical education is delivered has changed little, if any, over the past half century. What worked even a decade ago, no longer works today as learners start and then quickly abandon massive open online courses (MOOCs). Escape from the current situation-the technical education death star-is now more important than ever. A review of the literature reveals that a small number of science educators have begun to explore the pedagogical power of sequential art. In fact, dual coding theory and the learning sciences strongly support the use of visuals in combination with text, further validating the innovative use of sequential art and data storytelling in interactive data science/A.I. learning experiences. The potential payoff clearly justified the risk, and the DataStory development team decided to act. The result was the world's first DataStory prototype that embedded data science content and interactive exercises within a cartoon story.
Focus group participants responded positively to this innovation, with suggestions for future refinements. Since then, a second DataStory has been finished, with two additional stories nearing completion. The result of making improvements to the narrative arc and education delivery of the DataStory is the second version of the DataStory with a different storyline and edited characters. The new DataStory follows DataDog and StatCat on an adventure of trying to figure out whether the fountain of youth type elixir called illudium phosdex exists based on StatCat's long family history. The