\documentstyle[czech]{asu2}
%\documentstyle[11pt]{article}
\hyphenation{sun-spot}
\hyphenation{lengths}
\begin{document}
%\setcounter{page}{1}
\title{~\\
{Daily relative sunspot numbers 1818~--~1848:}\\
{Reconstruction of missing observations}}
\author{Vojt\v{e}ch Letfus}
\institute{
Astronomical Institute, Academy of Sciences of the Czech
Republic\\ 25165 Ond\v{r}ejov, Czech Republic}
\abstract{
The missing daily relative sunspot numbers in the time
interval 1818 -- 1848 were reconstructed by the nonlinear
two-step method of interpolation. In the first step
directly interpolated gaps were not longer than five days. In the
second step, the data were sorted in the so called Bartels
scheme, i.e. in rows of the length of 27~days subsequently
ranged in a matrix. The missing data of longer gaps were
interpolated columnwise, i.e. the missing value at any
position was interpolated from the data at the same positions
of preceeding and following rows. The procedure enables
to interpolate long gaps and simultaneously respect the 27-day
data variation. The Appendix A contains annual tables of
daily data, Appendix B gives monthly and annual means and Appendix
C presents simultaneously annual plots of primary data and of those
reconstructed by interpolation. The differences between the
monthly and annual means of primary data and of data
completed by interpolation are small and fluctuate around
zero. Only in the time interval 1835 -- 1842, when the
frequency of observations was lowered, the amplitude of
fluctuations is enhanced. The dispersion of monthly
differences ${\sigma}$ is $\pm4.3~R$ and of annual means
$\pm1.1~R$. The
two-step method of interpolation was tested on the daily data
series in the time interval 1918 -- 1948. The sequence of
missing daily data in the years 1818 -- 1848 represents a
masking function. The differences between the monthly and
annual means of primary and modified data are small with
fluctuations around zero and with dispersion $\sigma$ for monthly
differences $\pm2.7~R$ and for annual differences $\pm0.6~R$. The
small dispersion gives evidence about a high reliability of
relative sunspot numbers derived from observations in the
years 1818 -- 1848 and also about the effectivity of the
two-step method of interpolation.}
\maketitle
\section{Introduction}
It is well known that the time series of relative
sunspot numbers, which begins in the year 1700 (Waldmeier,
1961), is not homogeneous. The systematic worldwide
observations of the Sun, initiated by Wolf, began in the
middle of 19th century after the recognition of Schwabe
(1844) that the occurrence frequency of sunspots varies in
11-year cycles. Starting in the year 1849 there exist
series of daily observations of relative sunspot numbers
without interruptions, i.e. fully reliable data series. Wolf
intended to prove for a longer time interval the sunspot
cyclicity discovered by Schwabe. He exerted great effort to
collect all accessible old observations back to the beginning
of telescopic era. Going back the frequency of known
observations show great and very irregular variation (Mayaud,
1977). Correspondingly to the frequency of observations, it
was possible to derive monthly means back to the year 1749
and only annual means back to the year 1700. The observations
collected for the time interval 1610-1700 are scarce so that
only estimates of epochs of cycle minima and maxima were
%\fig{Frequency distribution of gap lengths in the daily
% relative sunspot numbers in the years 1818 -- 1848.}
%{82}{52}{fig_1.pcx}{b}
possible. This is the cause of non uniform reliability of
sunspot numbers when going back in time. Some authors
%\bigfig{Monthly percentages of daily observations of sunspots
%(dashed line) from 1818 to 1848 and monthly percentages of
%daily data after interpolation (full line).}
%{160}{80}{fig_2.pcx}{t}
(Arbey, 1956; Mayaud, 1977; Sonett, 1983; Vitinsky, 1991;
Wilson, 1987) expressed various doubts about sufficient
credibility of the realative sunspot numbers derived by
Z\"{u}rich astronomers from observations made before 1849. Thus
we should like estimate the measure of reliability of
relative sunspot numbers derived from incomplete daily
observations. For this reason we attempted to reconstruct the
missing daily data by convenient interpolation method and
compare the improved results with primary data. The
differences can be considered as a measure of reliability.
\section{Data}\vspace*{-.5mm}
The daily relative sunspot numbers published by
Waldmeier (1961) begin with the year 1818. There exist
gaps in observations of various length of days in the time
interval from the beginning to the year 1848. The total number
of days with missing observations is 3248, i.e. 28.7\% of all
11323 days in this time interval. The most frequent are 922
one-day gaps (28.4\% of all missing days). With increasing gap
lengths their frequency rapidly decreases to the gap length
of 14 days, and all these gaps contain 96.1\% of all missing
days. The remaining 4 gaps have lengths 17, 21, 36 and 52
days. The frequency distribution of gap lengths is given in
Figure 1.
\section{Method and results}
The frequency distribution of gaps offers to recon\-struct \
the \ missing \ observations \ by \ interpolation. %\newpage
\noindent{}The structure also
determinates adequate interpolation procedure. The one- or
two-days gaps of missing data
can be interpolated relatively
easy. But with increasing length of the gaps increase
problems, if the interpolated estimates will be sufficiently
reliable. From this point of view the lengths of gaps for
direct interpolation are limited. The daily relative sunspot
numbers show 27-day cyclic variation caused by Sun's rotation
and non uniform longitudinal distribution of sunspot groups.
By direct interpolation of longer series of missing data the
uncertainty increases, if the 27-day variation cannot be
sufficiently respected. Therefore the interpolation procedure
must be divided into two steps.
In the first step we chose for direct interpolation the
gap lengths from one to five days of missing data. In this
way, we interpolated 47.8\% of all missing data. We used for
interpolation an algorithm written by Hiroshi (1972), which
is based on local procedures. This algorithm provides
nonlinear interpolation, which respects the course of data in
the neighborhood of the interpolated interval.
The gaps in sunspot observations with lengths greater
than five days were interpolated in the second step, by which
the 27-day variation of daily data is respected. For
this purpose, we sorted the time series data in the so called
Bartels scheme, i.e. in rows of the length of 27 days
subsequently ranged in a matrix. The sunspot activity lasting
at a given longitude for several solar rotations produces a
similar shape of the sunspot number variation at the same
positions of several subsequent rows of the matrix. The missing
data in a gap \hfill extend along the row \hfill and they can be
%\bigfig{The differences between monthly means of primary
%incomplete daily data and of daily data completed by
%interpolation (thin line), the differences between smothed
%monthly means (thick line), and the dispersion of differences
%between monthly means (two parallel dashed lines).}
%{160}{80}{fig_3.pcx}{p}
%\bigfig{The differences between annual means of primary
%incomplete daily data and of daily data completed by
%interpolation (thin line). The bars represent the dispersion
%of differences between monthly means in individual years and
%the two parallel dashed lines represent the dispersion of
%differences between annual means.}
%{160}{80}{fig_4.pcx}{}
%\bigfig{The differences between monthly means of primary
%complete daily data and of modified daily data completed by
%interpolation (thin line), the differences between smothed
%monthly means (thick line), and the dispersion of monthly
%means (two parallel dashed lines).}
%{160}{80}{fig_5.pcx}{}
%\bigfig{The differences between annual means of primary
%complete daily data and of modified daily data completed by
%interpolation (thin line). The bars represent the dispersion
%of differences between monthly means in individual years and
%the two parallel dashed lines represent the dispersion of
%differences between annual means.}
%{160}{80}{fig_6.pcx}{}\clearpage
\noindent{}interpolated from the data in the same column
positions of preceding and following rows. In the individual
columns of the Bartels scheme, most frequent were gaps with data
missing in one row and exceptionally occurred gaps where data
were missing in a given column in three subsequent rows. From
that given above, we can suppose that the two-step interpolation
method can yield sufficiently reliable estimates of missing
daily relative sunspot numbers.
In the Appendix A, annual tables of daily relative sunspot
numbers are listed and in the Appendix B their monthly
and annual means for each year in the time interval 1818 --
1848. The interpolated daily data in the annual tables are
printed for distinguishing in italics. In the Appendix C, annual
plots of daily data are given from tables of Waldmeier
(1961) before interpolation and simultaneously the plots with
daily data after interpolation. The data obtained after the
first step of interpolation, i.e. for gaps not longer than
five days, are connected with a full line, but when in any year
gaps occur with a length greater than five missing data, the
curve in these intervals is dashed.
\section{Discussion and summary}
The incompletness of daily observations of sunspots in
the years 1818 -- 1848 is shown in Figure 2. The dashed line
represents percentage numbers of days with known relative
sunspot numbers $R$ in each month. Clearly seen is the seasonal
variation with winter minima and summer maxima. The frequency
of observations in the years 1835 -- 1842 is lower than in
other years. In February 1824, the daily observations are
fully missing and the monthly mean
for this month was interpolated as noted by
Waldmeier (1961). All missing data were interpolated with
exception of the first seven days of the year 1818, where the
interpolation was not possible and this is expressed in
Figure 2 by the full line.
The differences between the monthly means calculated
from incomplete daily data ($O$) and from daily data completed
by interpolation ($C$) are given in Figure 3 by a thin line. For
the great part of the time interval 1818 -- 1848 the differences
$O - C$ are small and fluctuate around zero. Only
in the time interval 1835 -- 1842, when the frequency of daily
observations is lowered (see Figure 2), the amplitude of
fluctuations is clearly enhanced. The dispersion of
fluctuations $\sigma = \pm4.3~R$ is indicated by two parallel dashed
lines. The differences between smoothed monthly means, which
are very small, are drawn in Figure 3 by a thick line.
Similarly differences between annual means
shown in Figure 4 are very small. The bars show the dispersion of
differences between monthly means in individual years and the
two parallel dashed lines the dispersion of differences
between annual means ($\sigma = \pm1.1~R$).
We can perform a test of the used two-step method of
interpolation. The sequence of missing data gaps with
variable lengths in a series of equidistant data can be
considered as a masking function. If we apply this masking
function on a continuous series of data and then carry out
the interpolation, we can estimate the efficiency of the
interpolation method by comparison of these data with the
primary ones. As most convenable for testing was chosen the
time interval 1918 -- 1948. The differences between the
monthly means of primary data ($O$) and the monthly means
obtained from the modified data series after interpolation
($C$) are given in Figure 5 by a thin line. Also here the
$O - C$
differences are small and fluctuate around zero with an
enhancement only in the years 1935 -- 1942. The dispersion
$\sigma$
marked by two parallel dashed lines equals $\pm2.7~R$. The
$O - C$
differences between the annual means are given in Figure 6.
The bars represent the dispersion of monthly differences in
individual years. The dispersion $\sigma$ of differences
between annual means equal $\pm0.6~R$ is marked by two parallel
dashed lines.
We must take into account that the primary data ($O$) in
the time interval 1818 -- 1848 are incomplete data with gaps,
while in the time interval 1918 -- 1948, used for our testing,
the primary data are complete without gaps. Thus the greater
values of dispersion in the first case, compared with
the dispersion obtained by testing, is expectable. But the small
differerences between the dispersions of both cases and also
their small absolute values give evidence about high
reliability of relative sunspot numbers derived from
observations in the years 1818 -- 1848 and also about the
effectivity of the two-step interpolation method.
%\vspace{1pt}
%{\Large\bf~~~\\Acknowledgements}
\acknowledgement
This work was supported by the Academy of Sciences of
the Czech Republic under Grant no. K1-003-601.
\references
\ritem Arbey L.: 1956, {\em Bull. Astron.} {20}, 347.
\ritem Hiroshi A.: 1972, {\em Comm. ACM} {15}, 914.
\ritem Mayaud P.N.: 1977, {\em J. Geophys. Res.} {82}, 1271.
\ritem Schwabe H.: 1844, {\em Astron. Nachr.} {21}, 233.
\ritem Sonett C.P.: 1983, {\em J. Geophys. Res.} {88A}, 3225.
\ritem Vitinsky Yu.I.: 1991, in {\em `Problemy solnechnoj aktivnosti'},
Fiz.-Tech. Inst. AN SSSR, Leningrad, %\\ \>
p. 43 (in russ.).
\ritem Waldmeier M.: 1961, {\em The Sunspot Activity in the Years 1610
-- 1960}. Schultness and Co., %\\ \>
Z\"{u}rich.
\ritem Wilson R.M.: 1987, {\em Solar Phys.} {108}, 195.
\end{document}