Enhancing the Effectiveness of Library Outreach Programs: A Sabbatical Research Project


Published: March 25, 2024 by Dr. Michelle Chen 

[Dr. Michelle Chen completed her sabbatical in 2023. CIRI had the pleasure of interviewing her about her sabbatical research project and her advice for faculty applying for sabbatical.]

1. Can you talk about what your sabbatical project was about?

During my sabbatical, I focused on developing a predictive model to streamline and enhance the effectiveness of library outreach programs. This project involved collaboration with a local county library, an industry partner, and an international scholar. Our collective expertise aimed to provide both a technical and multicultural perspective on using data science to improve library services.

The collaborating library currently employs an equity-based model for outreach, considering factors like demographics, GreatSchools.org ratings, Title 1 school information, and free lunch ratios. While effective, this manual model is labor-intensive and time-consuming. To address staffing and budget limitations, I aimed to automate this process effectively and intelligently. Through cutting-edge data mining and machine learning techniques, I developed a classification model that could answer four key questions to optimize outreach efforts:

  • Developing metrics for outreach target identification: Which metrics are most influential in pinpointing populations that would benefit from outreach?
  • Prioritizing outreach: How can potential targets be ranked to ensure optimal resource allocation?
  • Customized outreach support: What type of outreach best suits the needs of a specific community?
  • Learning from experience: How can past outreach data be incorporated to continually improve future efforts?

This project utilized publicly available data from sources like GreatSchools.org and Census Data to train the classification model. The model then delivered results in two ways:

  • Binary classification determined if outreach was necessary for a specific area.
  • Multinomial classification recommended the most suitable type of outreach support for a particular community.

Moving forward, the library will evaluate and test the model’s effectiveness and efficiency using its existing benchmarks.

2. What motivated you to work on this project during your sabbatical?

My research has primarily focused on applying data science to non-profit and government organizations. However, the applications have often been inward-facing, aiming to improve internal operations or user retention. While these goals are important, a recent shift in the data science community has inspired me to explore its potential for the broader social good. This movement, known as “data science for social good,” involves using data science for positive social impact, providing data support to non-profits, and ensuring data science is used to positively and equitably benefit people. The digital divide resulting from the pandemic further highlights the need for data-driven solutions that address societal challenges, particularly for marginalized populations, and data science as a social and cultural tool rather than just a technical one. This growing emphasis on data science for social good has truly ignited my passion to explore its potential within libraries. For the field of Library and Information Science (LIS), what does this mean to us? How can LIS-oriented data science research contribute to tackling societal challenges? How to conduct data science research through a balanced, socio-technical lens to maximize its impact and relevance in LIS?

Inspired by these questions, two key themes have emerged at the intersection of data science for social good and LIS. One theme proposes expanding libraries’ role beyond information access, positioning them as key partners in data creation. This focus on inclusive data collection would strive to represent the social and economic realities of their communities more accurately, particularly for marginalized populations. The other theme emphasizes expanding existing programs to reach these underserved communities. This involves evaluating the effectiveness and reach of current services like electronic resources, while also exploring the implementation of new programs tailored to their specific needs. These evolving trends have fueled my research, which aims to develop an efficient and effective method for libraries to identify the marginalized communities most in need of outreach efforts.

3. Any surprises or challenges during this project?

A significant challenge encountered during this project was the limited availability of data, particularly for marginalized populations. Addressing this data gap effectively would empower library staff to allocate resources strategically toward acquiring the most relevant data for key metrics.

In data mining, a crucial step in developing a robust predictive model involves selecting the most “cost-effective” instances for inclusion in the training dataset. This project thus adopted the concept of cost-effective active learning (Saar-Tsechansky & Provost, 2004). Active learning iteratively selects data points most beneficial for model training, leveraging the model’s current knowledge. This approach can significantly reduce the number of data points needed to achieve optimal model accuracy, minimizing data acquisition costs for libraries with limited budgets.

4. Any suggestions you’d like to give to people who would like to apply for sabbatical leave?

A successful sabbatical application hinges on thorough preparation. Begin by researching your university’s sabbatical policy well in advance. This allows ample time to understand eligibility requirements, application deadlines, and funding opportunities. Next, reach out to your department chair and/or colleagues to discuss your proposed project and its alignment with departmental and university goals. A well-defined project with a clear scope is critical. This ensures you develop a feasible and financially sound timeline, maximizing the impact of your research leave. Furthermore, consider the potential research, scholarly, and creative activities (RSCA) that might emerge from your project. A well-defined project facilitates the planning and management of a productive pipeline for these RSCA outputs. Finally, maintaining open communication with colleagues and your department throughout your leave is highly recommended. This fosters a supportive network and ensures a smooth transition back to your regular workload upon your return.


Saar-Tsechansky, M., & Provost, F. (2004). Active Sampling for Class Probability Estimation and Ranking. Machine Learning, 54(2), 153–178.


Post new comment