3.1 Bike Rentals (Regression)

This dataset contains daily counts of rented bicycles from the bicycle rental company Capital-Bikeshare in Washington D.C., along with weather and seasonal information. The data was kindly made openly available by Capital-Bikeshare. Fanaee-T and Gama (2013)13 added weather data and season information. The goal is to predict how many bikes will be rented depending on the weather and the day. The data can be downloaded from the UCI Machine Learning Repository.

New features were added to the dataset and not all original features were used for the examples in this book. Here is the list of features that were used:

  • Count of bicycles including both casual and registered users. The count is used as the target in the regression task.
  • The season, either spring, summer, fall or winter.
  • Indicator whether the day was a holiday or not.
  • The year, either 2011 or 2012.
  • Number of days since the 01.01.2011 (the first day in the dataset). This feature was introduced to take account of the trend over time.
  • Indicator whether the day was a working day or weekend.
  • The weather situation on that day. One of:
    • clear, few clouds, partly cloudy, cloudy
    • mist + clouds, mist + broken clouds, mist + few clouds, mist
    • light snow, light rain + thunderstorm + scattered clouds, light rain + scattered clouds
    • heavy rain + ice pallets + thunderstorm + mist, snow + mist
  • Temperature in degrees Celsius.
  • Relative humidity in percent (0 to 100).
  • Wind speed in km per hour.

For the examples in this book, the data has been slightly processed. You can find the processing R-script in the book's Github repository together with the final RData file.


  1. Fanaee-T, Hadi, and Joao Gama. "Event labeling combining ensemble detectors and background knowledge." Progress in Artificial Intelligence. Springer Berlin Heidelberg, 1–15. doi:10.1007/s13748-013-0040-3. (2013).