build a home server or not and how it relates to RAID?
Why have a home server at all,?
The answer is , anything works at home. (just about anything and for sure anything that can run my favorite application Serviio ( love it) application loaded.
Any server works, if you keep your data backed up, any can work and function quite well.
The issues of speed at home are not a big deal, (avoid wifi for streaming) Nor are the numbers of users, what 2 TV's at the same time streaming video, not 100.? (no plans to start your own netflix business clone?)
I will assume you didn't buy a turn key server. (the box has everything you need and they held you hand though the setup process and all is done.)
Next up is buying real naked server, say a used real server from HP or DELL, two great choices and very robust (quality tops) (naked means no OS, no Operating System yet)
A good server is HP, DL380s G7 up, G7 can had now for $100 with no media drives, then you buy some very good SAS drives, 15k 2.5" drives of any sizes are very good.
My HP DL380 above matches every line below. (100% full blown, HW RAID TOO) The best HP RAID card is P420, (it can work in RAID or HBA with 1 bit flip) Robust and very versatile
A real server has: (all or most)
There are many issues related.
Best quote below I read:
"?Never fail? What does that mean?
Fail in a controlled way that can easily be mitigated?
Operational costs over a lifetime?Assume they're going to fail and plan for it. Then, they'll always be reliable."
My comment is , it will fail, plan on it, and be backed up, and doing it in the correct way. (using different media and never overwrite the last full backup, gee, keep them all if serious?)
SAS drives (enterprise, not toy SATA) are being upgraded rapidly in server farms world wide , for data bases or SQL fast is king SSD, but for web pages HDD plenty good. Those guys call HDD RUST, I think that is funny.
The FARMS typically upgrade every 3 years. (and uses there own failure tracking to base all decisions, on what and when to upgrade)
Home users wait until it fails dead as door nail. (as rule, but not always) Then CRY bit rot and RAID5 holes. (sad to see and hear) Gee it's expected (old HDD fail, and do so often if you have many spindles spinning.)
SSD are faster so less servers are needed now.(and the idea of hot and old data access, wow)
If all this seems a crashing bore,(it is) buy a box, any box even some old lamp PC, and stuff in a appropriate sized HDD SSD, and run it, and keep it backed up.
Set up the shares and pretend the box is a server. (HOME USE)
Windows 10 will run DNLA, for free and with software RAID too if you want to play there.
Or run PLEX like this.
If you are a serious Server builder or buyer, this is what need.
ECC RAM is mandatory IMO in my Opinion for any real server, watch this very good video. (and learn) It even protects you from cosmic rays, data corruption. (or ESD events or minor lightning surges,etc)
What good is a ECC Hard drive?, if you send garbage too it, from RAM.? (this is called grand contradition of you do that)
ZFS: (and for sure RAID-Z3)
No lie ZFS is the Bee's Knee's , but then again there is nothing at all wrong with any HP modern RAID card and Server , at all, nor and top grade RAID cards ! From say LSI.(AVAGO) (SAS-9361?)+ MSM manager?
The number one reason Home users (not pros) get caught with there pants down, is this! (not looking and not monitoring this)
The HP real servers watche this 24/7 and emails you that drive X is near dead or rapidly dying (believe it ) or you learn to read C6 parameter, or of Param, 05 grows way too fast, the DRIVE IS BAD.
Letting drives fail in a RAID , ignored by you? , why run RAID at all?
If using RAID 5, a 2nd drive that dies, the RAID is dead. If running RAID 6, then 3rd drive fails or while the 2nd is rebuilding the 3rd dies, boom then end of the road is now. (
It iss not some damned named WRITE HOLE as professed in most of all cases of this , the owner is running old HDD, 5+ years or runs SATA trash drives, that are know to fail at 26% rate at 4 year mark.
The owner also fails to run RAID managers, and report to you live drives that are failing NOW, and not understanding at all that SMART tells you and has predictive warnings now, for sure running HP RAID cards, or systems by HP.
Or on Dell server never learned to read the DRAC and RAID log files.
I can't speak for DELL, but am told it has the same abilities, ( if you must run cheap systems,at the least watch C6 and 05 like Hawk. or you will get in to trouble) Factiod: SSD uses different parameter and other things go watch.
You may see the below after a full data scrub action by you.
I can also boot to Linux disk and see the same thing. C6 is fail here !
One more point, it be not much uses to back up this HDD now, after never doing backups for year. (it will be full of corrupted data, but hope springs eternal)
There are 2 failures common here just a bad single drive, loosing power on the system or not owning a BBU for the WRITE back cache or the BBU battery is dead.
The other hole is when you have a failed drive and then one more fails while the thing is begging you for a new 1st dead drive, or its now in the drive rebuid process , one more drive fails.
Yes RAID 6 can reSYNC 2 drives at once, but not 3, if you want to call that a hole ok, but it is not really is it, (the poor thing (RAID brain) is just lost, totally)
The Array manager if you had run that you'd have seen that fact.
Best is to consider all these things, laid you here.(left panel index here)
HDD life spans , are first explained using the Classic bath tube curve, (left side is called infant mortallity) There is massive collected logs, from HUGE Server FARM's data to back up this fact.
In most every case of heavy usage , the drives do fail at 5 years. (if luckly and not sooner, as some do)
What matters most is product Tier?(Enterprise best and learn the specs will be 90% dutycycle or even 24/365) unlike 8 hour day consumer drives, that means the data sheets can not be compared here. (life) )
Consider how short lived PC cheap hdd can fail.
If you think you have problems read googles 1,800 servers farm stats. (ouch what painful to imgine)
Try hard to not read MTBF stats, after all, If you have 100,000 drives that each have a MTBF of 100,000 hours, then you should expect a drive to fail -- on average -- every hour.
What matters is heat, vibration (dropped Laptop running,? ouch) and power surges. (and luck)
I have zero idea who makes (maker) the best HDD or SSD. ( and changes what, every 3 months as new designs roll into the market)
This is a noble attempt but is based on products not made NOW.(or near) or this paper:
All information is rearward looking but most or all products are forward, (so buy a top brand, and get Enterprise grade and pay extra if wise....)
But I do buy Enterprise graded every time. (bottom fishing nets you flounders or mud suckers of some kind.)
Do not buy refurbs most are just dusted off drives, unless sold direct from maker.
Many drives sold as new on fleabay are in fact refurbs.
There is luck too.
What kills drives, is heat, then shock, and next is dutycycle. (heat kills all electronics sooner, ask the product engineer or the MIL spec, testing using heat (and fast cycling that add more heat) to get OUT the early failure of of electronics,
LAST: a quote from a smart guy.
"MTBF is just that. "mean". An average. Some drives will survive 30+ years and some will last five minutes, without invalidating the average. Also there is will caveats with any MTBF (usually in the small print somewhere) like how many power cycles and spin-up/spin-down cycles are assumed over a given time period, and the fact that the MTBF assumes perfect conditions that no drive experiences for its whole life in the real world. – David Spillett"
Thanks DAVID , that post cuts to the chase.
The HARDWARE RAID or SOFTWARE RAID issues: (this is a moving target so do not fail to know that) Features are added every quarter my guess.
The Boot and Virus issues , are halting to me..this list is old and many items changed.,but makes a nice hit list to check all RAID systems before taking the plug or at all.
TThe ZFS gang, will take issues with this list. (make them prove it, and the can on some...)
BIT ROT (oxymoron?)(yes sectors fail, but bits never ever rot)
The folks using this silly word phase are only confusing facts, in fact that is not "IT" nor Industry used phase, they should say "DATA corruption." or FILE.
The could call it Sector Rot and be way more accurate. Even Sector Fungus. LOL.
Please stop calling it ROT. ok>? (only dead plants and animals rot dead) I call pure TOMMY ROT.
The next lie is they pretend that Sectors last forever, sorry that dream is lie too, learn: Backup, Backup, Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,Backup,!
learn to roll them, do not overwrite older backups. ROLL them , stack them like you do cut firewood.
Learn that mitigation is your job. (sorry humans still run the earth not robots , no SKYNET YET, LOL)
Your job or my HP RAID cards job, is to tell me before drives are dead, or near dead or are dead now, or are now full of dead sectors seen here.
The reason you have corruption is because you failed to look first. Then you suffered the OS telling you the files are corrupted now. (late)
One silly Google report says SMART is not effective, well gee guys you did that test way way before SMART matured in 2010,, wow. just wow what silly comments by them. (SMART EVOLVED over time) TIME MATTERS.
The truth is here. In bullets. ( Even NASA has very serious data corruption issues in space craft from cosmic ray and the nasty solar winds.) All media corrupts, CD/DVD and all rotating magnetic memories do ,called HDD.
Hard disk drives, today all have self healing features, to magically hid this problem from you, if not , the HDD would be useless. (if bored go to Seagate.com and read endless white papers on topic for sure SMART)
In 99.9% of all cases of consumers crying BIT ROT, it is because they do not monitor their drives at all or failed to do so in year. (its like driving your car on bald tires, never once looking at them ever) DOOMed to fail.
I quote HP here: READ IT ALL IS BEST in the HP DOCUMENTS:
" Pre-Failure Warranty using S.M.A.R.T technology We pioneered failure-prediction technology for disk drives by developing monitoring tests run by Smart Array controllers. Called Monitoring and Performance (M&P) or Drive Parameter Tracking, Smart Array controllers externally monitor disk drive attributes such as seek times, spin-up times, and media defects to detect changes that could indicate potential failure. We worked with the disk drive industry to help develop a diagnostic and failure prediction capability known as Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.). As S.M.A.R.T. matured, we used both M&P and S.M.A.R.T. to predict disk drive failures. S.M.A.R.T. has matured to the point that we rely exclusively on this technology to predict disk drive failure to support Pre-Failure Warranty. Since 1997, all HP SCSI, SAS, and SATA server-class disk drives have incorporated S.M.A.R.T. technology. S.M.A.R.T. disk drives inform the host when a disk drive is experiencing abnormal operation likely to lead to drive failure. S.M.A.R.T. places the monitoring capabilities within the disk drive itself. These monitoring routines are more accurate than the original M&P tests because they have direct access to internal performance, calibration, and error measurements for a specific drive type. HP Smart Array controllers scan the disk drive media during idle time and
repair, or report, any media defects detected. The controllers recognize S.M.A.R.T. error codes and notify HP Systems Insight Manager (SIM) of a potential problem. HP SIM notifies administrators of drive failures. Automatic data recovery with rapid rebuild technology When you replace a disk drive in an array, Smart Array controllers use the fault-tolerance information on the remaining drives in the array to reconstruct the missing data and write it to the replacement drive. If a drive fails during the rebuild, the reconstruction fails and the data is likely to be permanently lost. "
"Smart Array controllers perform a background surface analysis during inactive periods, continually scanning all drives for media defects.
Smart Array controllers can also detect media defects when accessing a bad sector during busy periods.
If a Smart Array controller finds a recoverable media defect, the controller automatically remaps the bad sector to a reserve area on the disk drive.
If the controller finds an unrecoverable media defect and you have configured a fault-tolerant logical drive, the controller automatically regenerates the data and writes it to the remapped reserved area on the disk drive."
HP controllers are NOT TOY GRADE, not by a country MILE.
Data sheets do exist : Below is top one From SEAGATE. SAS Enterprise grade drives. (do not read these boring facts)
This is over and above what each disk does by itself every minute of every day with ECC corrections.
Each drive from year 2010 (SAS , with the 4k sector upgrade) has 100 Byte ECC now.
"ECC maximum burst correction length of 520 bits for 512 byte blocks and 1400 bits for 4k byte blocks (now)
This 100 byte Hamming code, allows the drive to discover bad data, and to correct it on the fly, and does and is way better that the 10 year older drives of lore.
The word Bit ROT the inane phrase for data corruption.( it's normal at the below stated levels, data corruption is very low if and only if C6 is not end of life.)
This is as good as it gets, not sure if SSD is better, in rates of failure over long time spans.(the experts say yes it is)
The experts call a HDD RUST, (as in Iron) The SSD is mostly Silicon, not rust. or call HDD Spinners. (Jargon /Slang )
Data sheet excerpts: (bold by me, these prases are Seagate names only)
Read Error Rates
1. Error rate specified with automatic retries and data correction with ECC enabled and all flaws reallocated.
Less than 10 errors in 10^12 (say 10 to the 12 power or 10E12 (E= exponent) bits transferred (OEM default settings)
Less than 1 sector in 10^16 bits transferred (=1E17 (exponential notation) 9094 Terabytes or 9 Pentabytes (1 sector wrong in 9 Pentabytes read) <<< and amazing feat
Less than 1 sector in 10^21 bits transferred (to 1.25E20 bytes or to 111 EXAbytes, (yah huge)
Interface error rate:
Less than 1 error in 10^12 bits transferred
Mean Time Between Failure (MTBF): 2,000,000 hours (using 90% duty rules not home weak 20% rules)
Annualized Failure Rate (AFR) 0.44% (Enterprise grade)
The error rates stated in this manual assume the following: (bold are my changes)
• The drive is operated in accordance with this manual using DC power as defined in paragraph 6.3, "DC power requirements." (spec power no noise)
• Errors caused by host system failures are excluded from error rate computations. (like using non ECC ram to send garbage to the DRIVE ,.oops)
• Assume random data.• Default OEM error recovery settings are applied. This includes AWRE, ARRE, full read retries, full write retries and full retry) the OS must do this or it is JUNK OS. (upgrade it)
The above proves there is no such thing as ZERO file corruption, (BIT ROT FUD NAME) on any drive made,
Only how often matters. (to you, or anyone)
Here is a list of drive failures; (all common)
All drives (not dead) shows that its at it's end of life,every one does that, not totally dead.
SMART even lets you watch Sectors failing faster, (repairs parm 05) and this is early sign to get rid of the drive now, this is how HP predictive failure systems work, using this technology.
Not only that, but when the OS uses that drive, with C6 set to max and uses bad sectors the drive sends HARD ERRORS TO OS. (status bit = error set)
If our OS ignores that , then that OS is JUNK !!! One more FACT.
If your RAID controller engine reads errors, (drive reported hard errors ) the Engine, retries it or goes to the next drive, (in round robin fashion is best)
Some raid engines read say all say RAID 5 drives in parallel, and returns the first drive with and answer. (the drives have 2 ports and are full duplex, we can TX and RX to the drive at the same time, on 2 ports A&B))
This speeds up reading, but if the answer is FAIL it goes to then next drive ready (the next ACK). If good then that is what RAID returns for DATA.
The RAID engine is not stupid as some profess on line (well modern systems, not 20 year old raid cards)
The tricky spec above,is "Miscorrected Data", but is one bad sector read in 111 EXAbytes, (I am not sure I can live that long , nor care) (this is the 100byte HAMMING code slip through? I think)
NO ECC in earth is 100% dead nuts accurate. (but the above is very good) As they say do the math, using your data.
Engine HW or Software but is the RAID algorithm , mostly a trade secret actual details. (unless open source)
All storage media ends up corrupted, only how much you can tolerate is in question.
COSMIC RAYS? (and BIT ROT?) (these comments are tongue in cheek, you think you have problems?)
wiki , learn to read how it works then what is real. Yes, your PC IS A TOY and can drop bits. (and will)
As an example, the spacecraft Cassini–Huygens, launched in 1997, contains two identical flight recorders, each with 2.5 gigabits of memory in the form of arrays of commercial DRAM chips. Thanks to built-in EDAC functionality, spacecraft's engineering telemetry reports the number of (correctable) single-bit-per-word errors and (uncorrectable) double-bit-per-word errors. During the first 2.5 years of flight, the spacecraft reported a nearly constant single-bit error rate of about 280 errors per day. However, on November 6, 1997, during the first month in space, the number of errors increased by more than a factor of four for that single day. This was attributed to a solar particle event that had been detected by the satellite GOES 9.
I think the Voyager space craft had memory hard failures, looking like shot gun blast hit it many years latter, (in the news then) for the Saturn drive by.?
Some are just bit flips and not damaged physically. The 2 computers can be software upgraded in flight, and were.
They (JPL?) uploaded new binary codes and data, to skip past the damaged cells, the cosmic ray damaged cells. (a very tedious job, that I do NOT ENVY but OMG I love you guys !)
At the same time upload new code that compresses photos greatly (like we do today) and then like magic, Voyager sends photos that are near heart stopping good.
Now the paranoia of FUD and File corruption.
There are only so many file types. (vast but can be reduced)
With backup alternate media. (BD-R) etc.
If you are a hording hound, get TAPE, or high end DAT. It too uses ECC. or RS ..,
LT0-6? ecc Reed-Solomon error rate 1 in 1x1017 bits used for $1000 TL2000 (1/7th new cost)
AS you can see no device made on earth is error free, sorry, to break anyones bubble in that. RTM. (read the manual?)
Get this from HP or Dell a LTO-6 Ultrium 6 Extrenal SAS Tape Back Up Drive?
If say owning 20 External HDD for backup (then go off line) , is too much, try the above. (20 x 1TB drives?) time to upgrade to this.
20 TB is good boundary for heavy iron, heavy rust powered TAPE. (cost wise and life spans of TAPE ,wise.)
Virtualizaton? why not use that.? Load this to Bare Metal.
The below has a free version but the real one is super expensive, and no home cheaper versions. The free one is cripple ware, but if you like the basics it is FREE. V6
You buy VMWARES ESXI.
Get it for free. here.(home use)
Using HP off line tools (free btw) SSA v3 , run this build any array you want, even 2 arrays, one for OS , 2nd for Data.
Do not use the ROM BIOS to build the array use SSA. (free of cost at HPE.com) Storage systems array , manager , off line, you boot the CD_R, easy,
Install VM then load, say Windows Server 2012r2, as a VM, and after happy with this, then back that up as an image using ESXI, so if the server "r2" gets corrupted, you come back fast. (VM's are super cool in this way)
You can install what ever you want as a VM , any that ESXI allows.(RTM)
Many folks love the IDEA of Image backups, for good reasons, (it is fast)
version 1. 05-1-2018