Data science is the art and science of collecting, organizing, processing, analyzing, archiving, preserving, and providing access to massive amounts of data in order to extract meaningful information.
As noted by LIS professional Amy Affelt in her book The Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals (Information Today, 2015), librarians have always worked with reasonably large amounts of data (circulation desk metrics, budgeting and strategic planning, GIS demographic mapping, etc.), but most often in relational databases and similar familiar formats. The difference is that today’s volume of data generation is so massive (i.e., big data) that it can only be managed and made sense of through complex computer-based algorithms.
This is the data challenge now facing businesses, government agencies, educational institutions, healthcare organizations, and social media platforms, among others – how do they make sense of all these data points in order to make smarter decisions?
To understand the range of data-related employment opportunities, it’s useful to first consider the data lifecycle, or all the points in gathering, managing, using, and storing data where a data specialist might be involved. For example, a typical data lifecycle might involve the following steps:
- Collecting the raw data, which includes identifying the relevant/best data sources and vetting their credibility and appropriateness for the task or question at hand.
- Processing the data, i.e., bringing them into a data management system(s) in a uniform format and structure.
- Cleaning the data, which includes “scrubbing” the data of items such as duplications so reliable datasets can be created.
- Creating and applying algorithms or data queries that will surface data relevant to the question or issue being considered (often called “data mining”).
- Analyzing the datasets, or results of the models and algorithms that have been run, in order to identify meaningful, actionable patterns of information – in other words, what is this data telling us about the question we are asking? What projections about the future can we make based on these historical patterns?
- Communicating the outcomes, often through some type of data visualization or a dashboard format, in a manner that lets key stakeholders easily understand and base decisions on the findings.
- Preserving the data, creating and maintaining systems that provide for the preservation, access, retrieval, and potential re-use of the relevant data and datasets for future reference.
Students interested in data science and data librarianship might find work in any of the activities identified above, depending on where in the data lifecycle process they want to specialize. Based on that range of options, in addition to the very broadly defined “data librarian,” a representative list of data-focused jobs include (among many others):
- Big data engineer
- Business intelligence analyst
- Customer data analyst
- Data acquisitions specialist
- Data analytics manager
- Data analyst/report writer
- Data and information specialist
- Data architect
- Data archivist
- Data asset manager
- Data curator
- Data metadata specialist
- Data modeler
- Data quality manager
- Data services librarian
- Data visualization specialist
- Data warehouse manager
- Database developer
- Governance data quality steward
- Research data librarian
- Scientific data manager
MLIS Skills at Work Report
The includes important trends and data that are needed to prepare for career advancement within the information professions. The following information within the report relates directly to the data science career path. However, slides #12, #13, and #14 showcase/highlight the skills most valuable to employers.
- See the report, slides #5 through #8 for more detailed information about hiring trends and slide #21 for representative job titles
- See slide #26 to view sample job titles, job duties, job skills, and technology/standards for data management and analysis
- See also slides #25 (Collection, Acquisition and Circulation), #24 (Cataloging and Metadata), #32 (Reference and Research), and #31 (Outreach, Programming and Instruction) for additional roles within this career pathway
Core Theory and Knowledge
The core theory and knowledge of data science is structured around the key activities encompassed in the data lifecycle and the processes used to derive meaning from that data. In general, core knowledge areas include:
- The goals and uses of data science for decision-making, predictive modeling, and similar business cases
- The data lifecycle and each phase within that lifecycle
- Data science systems and technologies and how to apply them in real-life settings
- The processes by which disparate sources of data can be gathered, formatted for uniformity, and made searchable so as to provide meaningful information and insights
- The primary tools of data management and manipulation such as Hadoop, Splunk, Sumo Logic, and Spark and how to choose the best solution for specific data challenges
- Understanding and being able to apply fundamental data science activities such as data mining, data analytics, data querying, and data visualization in order to elicit and present actionable business or organization insights
- The diverse range of existing and emerging data sources such social network interactions, personal “wearable” health/activity monitors, shopping patterns, and similar emerging applications
The MLIS program requires 43 units for graduation. Within those units, six courses (16 units) are required of all MLIS students and must be taken as part of all career pathways: INFO 203, INFO 200, INFO 202, INFO 204, INFO 285, and either INFO 289 or INFO 299. Beyond those six courses, a student is free to select electives reflecting individual interests and aspirations.
If you are interested in this career pathway, you may choose to select from the foundation or recommended course electives listed below. Foundation courses form the foundational knowledge and skills for this pathway. If you can only select a few electives, then choose from the foundation courses. See also the recommended courses in the Areas of Emphasis section below.
The career pathway described here is provided solely for advising purposes. No special designation appears on your transcript or diploma. All graduating students receive an MLIS degree.
- INFO 203 — Online Learning: Tools and Strategies for Success
- INFO 200 — Information Communities
- INFO 202 — Information Retrieval System Design
- INFO 204 — Information Professions
- INFO 285 — Applied Research Methods in Library and Information Science
- INFO 289 or INFO 299 — Culminating Experience
- INFO 220 — Resources and Information Services in Professions and Disciplines Topic: Data Services in Libraries [Select class number and then topic] * Note this course has a prerequisite (waived for Post Masters Certificate students.
- INFO 246 — Information Technology Tools and Applications: Advanced Topics: Big Data Analytics and Management, Information Visualization, Text and Data Mining, Python [Select class number and then topic]
- INFO 282 — Seminar in Library Management Topic: Project Management [Select class number and then topic]
- INFO 287 — Seminar in Information Science Topics: : Cybersecurity, Problem Solving with Data, Collecting and Analysing Data [Select class number and then topic]
Effective leadership and management (of people and information) is critically important for all types of work environments and clients. We recommend that students also consider selecting courses from the Leadership and Management career path to complement or supplement core skills in other areas.
Areas of Emphasis within the Data Science Pathway
While all students earn an MLIS degree from the iSchool (no special designation appears on academic transcripts or diplomas), students may include Area of Emphasis information about their skill sets on resumes and in cover letters. The iSchool faculty (with input from the Knowledge Organization Program Advisory Committee) developed the recommended courses below for these Areas of Emphasis.
Data & Records Management
This area of emphasis focuses on data analytics, data curation, data management, data preservation, data processing, data querying/mining, data solutions, data sources, discerning meaning from data, and tools and systems.
- INFO 220 — Data Services in Libraries
- INFO 282 — Seminar in Library Management — Topic: Digital Asset Management
- INFO 284 — Seminar in Archives and Records Management
- Digital Curation
- Enterprise Content Management and Digital Preservation
- Tools, Services, and Methodologies for Digital Curation
- INFO 287 — Seminar in Information Science
- AI and Data Ethics
- Collecting and Analyzing Data
- Design Thinking
Data Analytics & Communication
This area of emphasis focuses on data analytics, data communication, data querying/mining, data solutions, data visualization, discerning meaning from data, representing meaning in data, tools and systems.
- INFO 246 — Information Technology Tools & Applications: Advanced
- Big Data Analytics and Management
- Information Visualization
- Text/Data Mining
- INFO 287 — Seminar in Information Science
- Design Thinking
- Health Informatics (See note)
- Problem Solving with Data – Part One
- Problem Solving with Data – Part Two
- INFO 293 Introduction to Data Networking
Note: See also INFM 210 Health Informatics.
Faculty pathway advisors are available to help guide you and answer questions about planning a career in their area of expertise.
For an excellent introduction to the field, check out Amy Affelt’s Accidental Data Scientist: Big Data Applications and Opportunities for Librarians and Information Professionals (Information Today, 2015), available through King Library as an ebook.
For checking out professionals engaged in data-related information work, consider joining the iSchool student chapters of ASIS&T and/or SLA and exploring the associations’ special interest groups, for example SLA’s Data Caucus.
In addition, you may also want to check out the following resources.
- Defining Data Librarianship: A Survey of Competencies, Skills, and Training – 2018 article that does a good job of exploring “the skills and knowledge that data librarians utilize and the training that they need to succeed.”
- Research Data Management and Services: Resources for Novice Data Librarians – Great starting point for those ready to immerse themselves in data management resources, including discussion lists.
- Data Scientist: The Sexiest Job of the 21st Century – The Thomas H. Davenport article in the Harvard Business Review (October 2012) that launched thousands of data scientist careers.
- Keeping Up with…Research Data Management – from the Association of College and Research Libraries (ACRL), a solid overview of the role data management plays in academic research.