Construction of Histogram with Variable Bin-Width Based on Change Point Detection
For a given set of samples with a numeric variable and a set of nominal variables, we address a problem of constructing a histogram drawn by K bins with variable widths, so as to have relatively large numbers of narrow bins for some ranges where numeric values distribute densely and change substantially, while small numbers of wide bins for the other ranges, together with the characteristic nominal values for describing these bins as annotation terms. For this purpose, we propose a new method, which incorporates a change point detection method to numeric values based on an L1 or L2 error criterion, and an annotation terms identification method for these bins based on the z-score with respect to the distribution of nominal values. In our experiments using four datasets of humidity deficit (HD) collected from vinyl greenhouses, we show that our proposed method can construct more natural histograms with appropriate variable bin widths than those with an equal bin width constructed by the standard method based on square-root choice or Sturges’ formula, the histograms constructed with the L1 error criterion has more desirable property than those with the L2 error criterion, and our method can produce a series of naturally interpretable annotation terms for the constructed bins.
KeywordsHistogram Change point detection Variable bin-width Visualization
This material is based upon work supported by JSPS Grant-in-Aid for Scientific Research (C) (No. 18K11441), (B) (No. 17H01826) and Early-Career Scientists (No. 19K20417).